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>■ - ASSIGNMENT 

For valuable consideration, we, Brian Seed, of Boston. Mas sachusetts: and Jurgen Haas 
, nf SrWsheim. Germanv herebv assien to THF GFNKRAL HOSPITAL CORPORATION, a 



MASSACHUSETTS corporation having a place of business at 55 Fruit Street Boston, MA 02114, and 

its successors and assigns (collectively hereinafter called "the Assignee"), the entire right, title and 

interest throughout the world in the inventions and improvements which are subject of an application 

for United States Patent signed by us, entitled HIGH LEVEL EXPRESSION OF PROTEINS, filed 

September 20. 1996. and assigned U.S. Serial Number 08/717.294. and we authorize and request the 
attorneys appointed in said application to hereafter complete this assignment by inserting above the 
filing date and serial number of , said application when known; this assignment including said 
application, any and all United States and foreign patents, utility models, and design registrations 
granted for any of said inventions or improvements, and the right to claim priority based on the filing 
date of said application under the International Convention for the Protection of Industrial Property, 
the Patent Cooperation Treaty, the European Patent Convention, and all other treaties of like 
purposes; and we authorize the Assignee to apply in all countries in our name or in its own name for 
patents, utility models, and design registrations and like rights of exclusion and for inventors' 
certificates for said inventions and improvements; and we agree for ourselves and our respective heirs, 
Ilegal representatives and assigns, without further compensation to perform such lawful acts and to sign 
jsuch further applications, assignments, Preliminary Statements and other lawful documents as the 
lAssignee may reasonably request to effectuate fully this assignment. 
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this A3 day of /UW-^k. I9_l_k 




L.S. 



BRIAN 

STATE OF thn&irgch oS-e^ 

; County of JSS-^-XK 



:SS. 



Before me this \5~ day of lTtavch4*»/' . personally appeared 

l^>r\CAr\ 5^ee<j known to me to be the person whose name is subscribed to the 

foregoing Assignment, and acknowledged that he/she executed the same as his/her free act and deed 
for die purposes therein contained. 
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Notary Public 



My Commission Expires: /s//&/f£ 

[Notary's Seal Here] 
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-In WITNESS WHEREOF, I hereto set my hand and seal at 
this day of , 19 



L.S. 



JURGEN HAAS 

State of : 

:SS. 

County of : 



Before me this day of , 19 , personally appeared 

known to me to be the person whose name is subscribed to the 



foregoing Assignment, and acknowledged that he/she executed the same as his/her free act and deed 
for the purposes therein contained. 



Notary Public 

My Commission Expires: 

[Notary's Seal Here] 
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Assistant Commissioner of Patents and Trademarks 
Washington, DC 20231 
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Effective immediately, please change the address in the above matter to: 

Clarkj&Elbing LLP 
5 8 5_ CgirjinereiaLSlieet 
Boston, MA 02109-1024 

Please address all correspondence and telephone calls to Karen Lech Elbing 
(617) 723-4197 at the new address. The telephone and facsimile numbers are: (617) 723-6777 
and (617) 723-8962, respectively. 
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Title 



Date ofDepo S itCyyyyyyy>^yy /• ?, tfqi 

I hereby certify undjx 37 CFR 1.8(a) tW this correspondence is being 
deposited with the United States Post^ervice as first class mail with 
sufficient postage on the date indicated above and is addressed to the 
Assistant Commissioner of Patents and Trademarks, Washington D C 
20231. A v ^ ' 




This address is effective through February 1997. Prior to that time, we will inform you of our 
permanent address. 



Clark & Elbing LLP 
585 Commercial Street 
Boston, MA 02109-1024 
Telephone: (617) 723-6777 
Facsimile: (617) 723-8962 



Respectfully submitted, 
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Art Unit: 
Examiner: 



Assistant Commissioner of Patents and Trademarks 
Washington, DC 20231 

PETITION UNDER 37 CFR § 1 .47 
Applicants hereby submit this petition under 37 CFR § 1 .47. As stated in the 
accompanying Declarations of Dr. Brian Seed and Ms. Susan M. Cuffe, one of the inventors of 
the above-identified patent application, Jurgen Haas, cannot be reached despite diligent effort, 
and applicants request that this application be considered complete even in the absence of his 
signature on an oath or declaration. As stated in the accompanying Declarations, the last known 
address of Dr. Haas was Huberweg 13, 69198 Schriesheim, Germany. 



Submitted herewith is a check for the required fee under 37 CFR § 1.17(h). 



Date: 1 3 Qq^xaa^ l*)*)^ 



Clark & Elbing LLP 
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Boston, MA 02109-1024 
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Payment of the surcharge of $130.00 for late filing of the declaration. 



Date of Deposi 

I hereby certify urraer 37 CFR 1.8(a) thaHhis correspondence is being 
deposited with the-United States Postal Bedkce as first class mail with 
sufficient postage on the date indicated above and is addressed to the 
Assistant Commissioner of Patents and Trademarks, Washington, D.C. 



It is understood that this perfects the application and no additional papers or filing fees 
are required. If there are any other charges, or any credits, please apply them to Deposit Account 



Clark & Elbing LLP 
585 Commercial Street 
Boston, MA 02109-1024 
Telephone: 617-723-6777 
Facsimile: 617-723-8962 
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Applicant: JBRI&g JHSlgJ^ AND JORGEN HAAS 

Title : HIGH LEVEL EXPRESSION OF PROTEINS 
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Basic filing fee 
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%^ J^/ PATENT 
^ * ATTORNEY DOCKET NO: 00786/345001 



HIGH LEVEL EXPRESSION OF PROTEINS 



Field of the Invention 
5 The invention concerns genes and methods for 

expressing eukaryotic and viral proteins at high levels in 
eukaryotic cells. 

Background of the Invention 
Expression of eukaryotic gene products in 
10 prokaryotes is sometimes limited by the presence of codons 
that are infrequently used in E. coll. Expression of such 
genes can be enhanced by systematic substitution of the 
endogenous codons with codons over represented in highly 
expressed prokaryotic genes (Robinson et al., Nucleic Acids 
15 Res. 12:6663, 1984). It is commonly supposed that rare 
codons cause pausing of the ribosome, which leads to a 
failure to complete the nascent polypeptide chain and a 
uncoupling of transcription and translation. Pausing of the 
ribosome is thought to lead to exposure of the 3' end of the 
20 mRNA to cellular ribonucleases. 

Summary of the Invention 
The invention features a synthetic gene encoding a 
protein normally expressed in a mammalian cell or other 
eukaryotic cell wherein at least one non-preferred or less 
25 preferred codon in the natural gene encoding the protein has 
been replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gcc) ; Arg (cgc) ; Asn 
(aac) ; Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His (cac) ; 
30 lie (ate) ; Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe (ttc) ; Ser 
(age) ; Thr (acc) ; Tyr (tac) ; and Val (gtg) . Less preferred 
codons are: Gly (ggg) ; lie (att) ; Leu (etc); Ser (tec); Val 
(gtc) ; and Arg (agg) . All codons which do not fit the 
description of preferred codons or less preferred codons are 



non-preferred codons. In general, the degree of preference 
of a particular codon is indicated by the prevalence of the 
codon in highly expressed human genes as indicated in Table 
1 under the heading "High." For example, "ate" represents 
77% of the lie codons in highly expressed mammalian genes 
and is the preferred lie codon; "att" represents 18% of the 
lie codons in highly expressed mammalian genes and is the 
less preferred lie codon. The sequence "ata" represents 
only 5% of the lie codons in highly expressed human genes as 
is a non-preferred lie codon. Replacing a codon with 
another codon that is more prevalent in highly expressed 
human genes will generally increase expression of the gene 
in mammalian cells. Accordingly, the invention includes 
replacing a less preferred codon with a preferred codon as 
well as replacing a non-preferred codon with a preferred or 
less preferred codon. 

By "protein normally expressed in a mammalian cell" 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian genome such as those encoding Factor VIII, Factor 
IX, inter leukins, and other proteins. The term also 
includes genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes which 
are encoded by a virus (including a retrovirus) which are 
expressed in mammalian cells post-infection. By "protein 
normally expressed in a eukaryotic cell" is meant a protein 
which is expressed in a eukaryote under natural conditions. 
The term also includes genes which are expressed in a 
mammalian cell under disease conditions. 

In preferred embodiments, the synthetic gene is 
capable of expressing the mammalian or eukaryotic protein at 
a level which is at least 110%, 150%, 200%, 500%, 1,000%, 
5,000% or even 10,000% of that expressed by the "natural" 



(or "native") gene in an in vitro mammalian cell culture 
system under identical conditions (i.e., same cell type, 
same culture conditions, same expression vector) . 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding natural 
gene are described below. Other suitable expression systems 
employing mammalian cells are well known to those skilled in 
the art and are described in, for example, the standard 
molecular biology reference works noted below. Vectors 
suitable for expressing the synthetic and natural genes are 
described below and in the standard reference works 
described below. By "expression" is meant protein 
expression. Expression can be measured using an antibody 
specific for the protein of interest. Such antibodies and 
measurement techniques are well known to those skilled in 
the art. By "natural gene" and "native gene" is meant the 
gene sequence (including naturally occurring allelic 
variants) which naturally encodes the protein, i.e., the 
native or natural coding sequence. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the non-preferred 
codons in the natural gene are replaced with preferred 
codons or less preferred codons. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the non-preferred 
codons in the natural gene are replaced with preferred 
codons . 

In a preferred embodiment the protein is a 
retroviral protein. In a more preferred embodiment the 
protein is a lentiviral protein. In an even more preferred 



embodiment the protein is an HIV protein. In other 
preferred embodiments the protein is gag, pol, env, gpl20, 
or gpl60. In other preferred embodiments the protein is a 
human protein * In more preferred embodiments, the protein 
is human Factor VIII and the protein in B region deleted 
human Factor VIII. In another preferred embodiment the 
protein is green flourescent protein. 

In various preferred embodiments at least 30%, 40%, 
50%, 60%, 70%, 80%, 90%, and 95% of the codons in the 
synthetic gene are preferred or less preferred codons. 

The invention also features an expression vector 
comprising the synthetic gene. 

In another aspect the invention features a cell 
harboring the synthetic gene. In various preferred 
embodiments the cell is a prokaryotic cell and the cell is a 
mammalian cell. 

In preferred embodiments the synthetic gene includes 
fewer than 50, fewer than 40, fewer than 30, fewer than 20, 
fewer than 10, fewer than 5, or no "eg" sequences. 

The invention also features a method for preparing a 
synthetic gene encoding a protein normally expressed by a 
mammalian cell or other eukaryotic cell. The method 
includes identifying non-preferred and less-preferred codons 
in the natural gene encoding the protein and replacing one 
or more of the non-preferred and less-preferred codons with 
a preferred codon encoding the same amino acid as the 
replaced codon. 

Under some circumstances (e.g., to permit 
introduction of a restriction site) it may be desirable to 
replace a non-preferred codon with a less preferred codon 
rather than a preferred codon. 

It is not necessary to replace all less preferred or 
non-preferred codons with preferred codons. Increased 



expression can be accomplished even with partial replacement 
of less preferred or non-preferred codons with preferred 

^ 

codons. Under some circumstances it may be desirable to 
only partially replace non-preferred codons with preferred 
or less preferred codons in order to obtain an intermediate 
level of expression. 

In other preferred embodiments the invention 
features vectors (including expression vectors) comprising 
one or more the synthetic genes. 

By "vector" is meant a DNA molecule, derived, e.g., 
from a plasmid, bacteriophage, or mammalian or insect virus, 
into which fragments of DNA may be inserted or cloned. A 
vector will contain one or more unique restriction sites and 
may be capable of autonomous replication in a defined host 
or vehicle organism such that the cloned sequence is 
reproducible. Thus, by "expression vector" is meant any 
autonomous element capable of directing the synthesis of a 
protein. Such DNA expression vectors include mammalian 
plasmids and viruses. 

The invention also features synthetic gene fragments 
which encode a desired portion of the protein. Such 
synthetic gene fragments are similar to the synthetic genes 
of the invention except that they encode only a portion of 
the protein. Such gene fragments preferably encode at least 
50, 100, 150, or 500 contiguous amino acids of the protein. 

In constructing the synthetic genes of the invention 
it may be desirable to avoid CpG sequences as these 
sequences may cause gene silencing. Thus, in a preferred 
embodiment the coding region of the synthetic gene does not 
include the sequence "eg." 

The codon bias present in the HIV gpl2 0 env gene is 
also present in the gag and pol genes. Thus, replacement of 
a portion of the non-preferred and less preferred codons 
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found in these genes with preferred codons should produce a 
gene capable of higher level expression. A large fraction 
of the codons in the human genes encoding Factor VIII and 
Factor IX are non-preferred codons or less preferred codons. 
5 Replacement of a portion of these codons with preferred 

codons should yield genes capable of higher level expression 
in mammalian cell culture. 

The synthetic genes of the invention can be 
introduced into the cells of a living organism. For 

10 example, vectors (viral or non- viral) can be used to 

introduce a synthetic gene into cells of a living organism 
for gene therapy. 

Conversely, it may be desirable to replace preferred 
codons in a naturally occurring gene with less-preferred 

15 codons as a means of lowering expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson et 
al., Molecular Biology of the Gene , Volumes I and II, the 
Benjamin/ Cummings Publishing Company, Inc., publisher, Menlo 

20 Park, CA (1987); Darnell et al., Molecular Cell Biology , 

Scientific American Books, Inc., Publisher, New York, N.Y. 
(1986); Old et al., Principles of Gene Manipulation; An 
Introduction to Genetic Engineering , 2d edition, University 
of California Press, publisher, Berkeley, CA (1981) ; 

25 Maniatis et al., Molecular Cloning; A Laboratory Manual , 
2nd Ed. Cold Spring Harbor Laboratory, publisher, Cold 
Spring Harbor, NY (1989) ; and Current Protocols in Molecular 
Biology , Ausubel et al., Wiley Press, New York, NY (1992). 

By "transformed cell" is meant a cell into which (or 

30 into an ancestor of which) has been introduced, by means of 
recombinant DNA techniques, a selected DNA molecule, e.g., a 
synthetic gene. 
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By "positioned for expression" is meant that a DNA 
molecule, e.g., a synthetic gene, is positioned adjacent to 
a DNA sequence which directs transcription and translation 
of the sequence (i.e., facilitates the production of the 
protein encoded by the synthetic gene. 

Description of the Drawings 

Figure 1 depicts the sequence of the synthetic gpl20 
and a synthetic gpl60 gene in which codons have been 
replaced by those found in highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
gp!20 (HIV-1 MN) gene. The shaded portions marked vl to v5 
indicate hypervariable regions. The filled box indicates 
the CD4 binding site. A limited number of the unique 
restriction sites ares shown: H (Hind3) , Nh (Nhel) , P 
(Pstl) , Na (Nael) , M (Mlul) , R (EcoRl) , A (Agel) and No 
(Notl) . The chemically synthesized DNA fragments which 
served as PCR templates are shown below the gpl20 sequence, 
along with the locations of the primers used for their 
amplification. 

Figure 3 is a photograph of the results of transient 
transfection assays used to measure gpl20 expression. Gel 
electrophoresis of immunoprecipitated supernatants of 293T 
cells transfected with plasmids expressing gpl20 encoded by 
the IIIB isolate of HIV-1 (gpl20IIIb) , by the MN isolate of 
HIV-1 (gpl20mn) , by the MN isolate of HIV-1 modified by 
substitution of the endogenous leader peptide with that of 
the CDS antigen (gpl20mnCD5L) , or by the chemically 
synthesized gene encoding the MN variant of HIV-1 with the 
human CDSLeader (syngpl20mn) . Supernatants were harvested 
following a 12 hour labeling period 60 hours post- 
transfection and immunoprecipitated with CD4:IgGl fusion 
protein and protein A sepharose. 



Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
cells transfected with plasmids expressing gpl2 0 encoded by 
the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN isolate of 
HIV-1 (gpl20mn) , by the MN isolate of HIV-1 modified by 
substitution of the endogenous leader peptide with that of 
CDS antigen (gpl20mn CD5L) , or by the chemically synthesized 
gene encoding the MN variant of HIV-1 with human CDS leader 
(syngpl20mn) were harvested after 4 days and tested in a 
gpl20/CD4 ELISA. The level of gpl20 is expressed in ng/ml. 

Figure 5A is a photograph of a gel illustrating the 
results of a immunoprecipitation assay used to measure 
expression of the native and synthetic gpl20 in the presence 
of rev in trans and the RRE in cis. In this experiment 293T 
cells were transiently transfected by calcium phosphate co- 
precipitation of 10 fig of plasmid expressing: (A) the 
synthetic gpl20MN sequence and RRE in cis, (B) the gpl20 
portion of HIV-1 IIIB, (C) the gpl20 portion of HIV-1 IIIB 
and RRE in cis, all in the presence or absence of rev 
expression. The RRE constructs gpl20IIIbRRE and 
syngpl20mnRRE were generated using an Eagl/Hpal RRE fragment 
cloned by PCR from a HIV-1 HXB2 proviral clone. Each gpl20 
expression plasmid was cotransf ected with 10 /tg of either 
pCMVrev or CDM7 plasmid DNA. Supernatants were harvested 60 
hours post transf ection, immunoprecipitated with CD4:IgG 
fusion protein and protein A agarose, and run on a 7% 
reducing SDS-PAGE. The gel exposure time was extended to 
allow the induction of gpl20IIIbrre by rev to be 
demonstrated . 

z Figure 5B is a shorter exposure of a similar 

experiment in which syngpl20mnrre was cotransf ected with or 
without pCMVrev. 



Figure 5C is a schematic diagram of the constructs 
used in Figure 5A. 

Figure 6 is a comparison of the sequence of the 
wild-type ratTHY-1 gene (wt) and a synthetic ratTHY-1 gene 
(env) constructed by chemical synthesis and having the most 
prevalent codons found in the HIV-1 env gene. 

Figure 7 is a schematic diagram of the synthetic 
ratTHY-1 gene. The solid black box denotes the signal 
peptide. The shaded box denotes the sequences in the 
precursor which direct the attachment of a phophatidyl- 
inositol glycan anchor. Unique restriction sites used for 
assembly of the THY-1 constructs are marked H (Hind3) , M 
(Mlul) , S (Sacl) and No (Notl) . The position of the 
synthetic oligonucleotides employed in the construction are 
shown at the bottom of the figure. 

Figure 8 is a graph depicting the results of flow 
cytometry analysis. In this experiment 293T cells 
transiently transfected with either a wild- type ratTHY-1 
expression plasmid (thick line) , ratTHY-1 with envelope 
codons expression plasmid (thin line) , or vector only 
(dotted line) by calcium phosphate co-precipitation. Cells 
were stained with anti-ratTHY-1 monoclonal antibody 0X7 
followed by a polyclonal FITC-conjugated anti-mouse IgG 
antibody 3 days after transf ection. 

Figure 9 A is a photograph of a gel illustrating the 
results of immunoprecipitation analysis of supernatants of 
human 293T cells transfected with either syngp!20mn (A) or a 
construct syngpl20mn.rTHY-lenv which has the rTHY-lenv gene 
in the 3' untranslated region of the syngpl20mn gene (B) . 
The syngpl20mn. rTHY-lenv construct was generated by 
inserting a Notl adapter into the blunted Hind3 site of the 
rTHY-lenv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-lenv gene was cloned into the Notl site 



of the syngpl20mn plasmid and tested for correct 
orientation. Supernatants of 35 S labeled cells were 
harvested 72 hours post transf ection, precipitated with 
CD4:IgG fusion protein and protein A agarose, and run on a 
7% reducing SDS-PAGE. 

Figure 9B is a schematic diagram of the constructs 
used in the experiment depicted in Figure 9A. 

Figure 10A is a photograph of COS cells transfected 
with vector only showing no GFP fluorescence. 

Figure 10B is a photograph of COS cells transfected 
with a CDM7 expression plasmid encoding native GFP 
engineered to include a consensus translational initiation 
sequence . 

Figure IOC is a photograph of COS cells transfected 
with an expression plasmid having the same flanking 
sequences and initiation consensus as in Figure 10B, but 
bearing a codon optimized gene sequence. 

Figure 10D is a photograph of COS cells transfected 
with an expression plasmid as in Figure IOC, but bearing a 
Thr at residue 65 in place of Ser. 

Figure 11 depicts the sequence of a synthetic gene 
encoding green flourescent proteins (SEQ ID NO: 40). 

Figure 12 depicts the sequence of a native human 
Factor VIII gene lacking the central B domain (amino acids 
760-1639, inclusive) (SEQ ID NO:41). 

Figure 13 depicts the sequence of a synthetic human 
Factor VIII gene lacking the central B domain (amino acids 
760-1639, inclusive) (SEQ ID NO:42). 
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Description of the Preferred Embodiments 

EXAMPLE 1 

Construction of a Synthetic crpl20 Gene Having Codons Found 
in Highly Expressed Human Genes 

A codon frequency table for the envelope precursor 
of the LAV subtype of HIV-1 was generated using software 
developed by the University of Wisconsin Genetics Computer 
Group. The results of that tabulation are contrasted in 
Table 1 with the pattern of codon usage by a collection of 
highly expressed human genes ♦ For any amino acid encoded by 
degenerate codons, the most favored codon of the highly 
expressed genes is different from the most favored codon of 
the HIV envelope precursor* Moreover a simple rule 
describes the pattern of favored envelope codons wherever it 
applies: preferred codons maximize the number of 
adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is A 
is the most frequently used* In the special case of serine, 
three codons equally contribute one A residue to the mRNA; 
together these three comprise 85% of the serine codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon choice 
for arginine, in which the AGA triplet comprises 88% of the 
arginine codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerate codons whose third residue must be a pyrimidine. 
Finally, the inconsistencies among the less frequently used 
variants can be accounted for by the observation that the 
dinucleotide CpG is under represented; thus the third 
position is less likely to be G whenever the second position 
is C, as in the codons for alanine, proline, serine and 
threonine; and the CGX triplets for arginine are hardly used 
at all. 
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TABLE 1: Codon Frequency in the HIV-1 Illb env gene and 

in highly expressed human genes . 

High Env High Env 

Ala Cys 

GC C 53 27 TG 



10 Arg 

CG 



15 AG 



Asn 

AA 

20 



Leu 



*, -T ^ 



c 


53 


27 


T 


17 


18 


A 


13 


50 


G 


17 


5 


c 


37 


0 

\p* 


X 




A 
ft 


A 


6 


0 


G 


21 


0 


A 


10 


88 


G 


18 


8 


C 


78 


30 


T 


22 


70 


C 


75 


33 


T 


25 


67 



TT A 2 30 AG 



c 


26 


10 


T 


5 


7 


A 


3 


17 


G 


58 


17 


A 


2 


30 


G 


6 


20 


A 


18 


68 


G 


82 


32 


C 


48 


27 


T 


19 


14 


A 


16 


55 



Gin 

CA 



GA 



Gly 

GG 



Asp His 

GA C 75 33 CA 



lie 

25 AT 



CT C 26 10 TC 

30 



35 Lys Thr 

AA A 18 68 AC 



40 Pro Tyr 

CC C 48 27 TA 



c 


68 


16 


T 


32 


84 


A 


12 


55 


G 


88 


45 


A 


25 


67 


G 


75 


33 


C 


50 


6 


T 


12 


13 

^P* 


A 


14 


53 


G 


24 


28 


c 


79 


25 


T 


21 


75 


C 


77 


25 


T 


18 


31 


A 


5 


44 


c 

^p* 


28 

*p" 


8 

^pr 


T 


13 


8 


A 


5 


22 


fl 

V7 


Q 


n 


C 


34 


22 


T 


10 


41 


C 


57 


20 


T 


14 


22 


A 


14 


51 


G 


15 


7 


C 


74 


8 


T 


26 


92 
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G 


17 


5 








Phe 






Val 






TT C 


80 


26 


GT C 


25 


12 


T 


20 


74 


T 


7 


9 








A 


5 


62 








G 


64 


18 



Codon frequency was calculated using the GCG program 
established the University of Wisconsin Genetics Computer 
Group. Numbers represent the percentage of cases in which 
the particular codon is used. Codon usage frequencies of 
envelope genes of other HIV-1 virus isolates are comparable 
and show a similar bias. 



In order to produce a gpl20 gene capable of high 
level expression in mammalian cells, a synthetic gene 
encoding the gp!20 segment of HIV-1 was constructed 
(syngpl20mn) , based on the sequence of the most common North 
American subtype, HIV-1 MN (Shaw et al., Science 226:1165, 
1984; Gallo et al., Nature 321:119, 1986). In this 
synthetic gpl20 gene nearly all of the native codons have 
been systematically replaced with codons most frequently 
used in highly expressed human genes (Figure 1) . This 
synthetic gene was assembled from chemically synthesized 
oligonucleotides of 150 to 200 bases in length. If 
oligonucleotides exceeding 120 to 150 bases are chemically 
synthesized, the percentage of full-length product can be 
low, and the vast excess of material consists of shorter 
oligonucleotides. Since these shorter fragments inhibit 
cloning and PCR procedures, it can be very difficult to use 
oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
single-stranded oligonucleotide pools were PCR amplified 
before cloning. PCR products were purified in agarose gels 
and used as templates in the next PCR step. Two adjacent 
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fragments could be co-amplified because of overlapping 
sequences at the end of either fragment. These fragments, 
which were between 350 and 400 bp in size, were subcloned 
into a pCDM7 -derived plasmid containing the leader sequence 
of the CDS surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. Each of the 
restriction enzymes in this polylinker represents a site 
that is present at either the 5' or 3' end of the PCR- 
generated fragments. Thus, by sequential subcloning of each 
of the 4 long fragments, the whole gpl20 gene was assembled. 
For each fragment three to six different clones were 
subcloned and sequenced prior to assembly. A schematic 
drawing of the method used to construct the synthetic gpl20 
is shown in Figure 2. The sequence of the synthetic gpl20 
gene (and a synthetic gpl60 gene created using the same 
approach) is presented in Figure 1. 

The mutation rate was considerable. The most 
commonly found mutations were short (1 nucleotide) and long 
(up to 30 nucleotides) deletions. In some cases it was 
necessary to exchange parts with either synthetic adapters 
or pieces from other subclones without mutation in that 
particular region. Some deviations from strict adherence to 
optimized codon usage were made to accommodate the 
introduction of restriction sites into the resulting gene to 
facilitate the replacement of various segments (Figure 2) . 
These unique restriction sites were introduced into the gene 
at approximately 100 bp intervals. The native HIV leader 
sequence was exchanged with the highly efficient leader 
peptide of the human CDS antigen to facilitate secretion 
(Aruffo et al., Cell 61:1303, 1990) The plasmid used for 
construction is a derivative of the mammalian expression 
vector pCDM7 transcribing the inserted gene under the 
control of a strong human CMV immediate early promoter. 
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To compare the wild- type and synthetic gpl20 coding 
sequences, the synthetic gpl20 coding sequence was inserted 
into a mammalian expression vector and tested in transient 
transfection assays. Several different native gpl20 genes 
were used as controls to exclude variations in expression 
levels between different virus isolates and artifacts 
induced by distinct leader sequences. The gpl20 HIV Illb 
construct used as control was generated by PCR using a 
Sall/Xhol HIV-1 HXB2 envelope fragment as template. To 
exclude PCR induced mutations, a Kpnl/Earl fragment 
containing approximately 1.2 kb of the gene was exchanged 
with the respective sequence from the proviral clone. The 
wild-type gpl20mn constructs used as controls were cloned by 
PCR from HIV-1 MN infected C8166 cells (AIDS Repository, 
Rockville, MD) and expressed gpl20 either with a native 
envelope or a CD5 leader sequence. Since proviral clones 
were not available in this case, two clones of each 
construct were tested to avoid PCR artifacts. To determine 
the amount of secreted gpl20 semi-quantitatively 
supernatants of 293T cells transiently transfected by 
calcium phosphate co-precipitation were immunoprecipitated 
with soluble CD4: immunoglobulin fusion protein and protein A 
sepharose . 

The results of this analysis (Figure 3) show that 
the synthetic gene product is expressed at a very high level 
compared to that of the native gp!20 controls. The 
molecular weight of the synthetic gpl20 gene was comparable 
to control proteins (Figure 3) and appeared to be in the 
range of 100 to 110 kd. The slightly faster migration can 
be explained by the fact that in some tumor cell lines, 
e.g., 293T, glycosylation is either not complete or altered 
to some extent. 
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To compare expression more accurately gpl20 protein 
levels were quantitated using a gpl20 ELISA with CD4 in the 
demobilized phase. This analysis shows (Figure 4) that 
ELISA data were comparable to the immunoprecipitation data, 
with a gpl20 concentration of approximately 125 ng/ml for 
the synthetic gpl20 gene, and less than the background 
cutoff (5 ng/ml) for all the native gpl20 genes. Thus, 
expression of the synthetic gpl20 gene appears to be at 
least one order of magnitude higher than wild- type gpl20 
genes. In the experiment shown the increase was at least 25 
fold. 

The Role of rev in gpl20 Expression 

Since rev appears to exert its effect at several 
steps in the expression of a viral transcript, the possible 
role of non-translational effects in the improved expression 
of the synthetic gpl20 gene was tested. First, to rule out 
the possibility that negative signals elements conferring 
either increased mRNA degradation or nucleic retention were 
eliminated by changing the nucleotide sequence, cytoplasmic 
mRNA levels were tested. Cytoplasmic RNA was prepared by 
NP40 lysis of transiently transfected 293T cells and 
subsequent elimination of the nuclei by centrifugation. 
Cytoplasmic RNA was subsequently prepared from lysates by 
multiple phenol extractions and precipitation, spotted on 
nitrocellulose using a slot blot apparatus, and finally 
hybridized with an envelope-specific probe. 

Briefly, cytoplasmic mRNA 293 cells transfected with 
CDM&, gpl20 IIIB, or syngpl20 was isolated 36 hours post 
transfection. Cytoplasmic RNA of Hela cells infected with 
wild-type vaccinia virus or recombinant virus expressing 
gpl20 Illb or the synthetic gpl20 gene was under the control 
of the 7.5 promoter was isolated 16 hours post infection. 
Equal amounts were spotted on nitrocellulose using a slot 
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blot device and hybridized with randomly labeled 1.5 kb 
gpl20IIIb and syngpl20 fragments or human beta-actin. RNA 
expression levels were quantitated by scanning the 
hybridized membranes with a phospoimager . The procedures 
used are described in greater detail below. 

This experiment demonstrated that there was no 
significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl20 gene. 
In fact, in some experiments cytoplasmic mRNA level of the 
synthetic gp!20 gene was even lower than that of the native 
gpl20 gene. 

These data were confirmed by measuring expression 
from recombinant vaccinia viruses. Human 293 cells or Hela 
cells were infected with vaccinia virus expressing wild-type 
gpl20 Illb or syngp!20mn at a multiplicity of infection of 
at least 10. Supernatants were harvested 24 hours post 
infection and immunoprecipitated with CD4 : immunoglobin 
fusion protein and protein A sepharose. The procedures used 
in this experiment are described in greater detail below. 

This experiment showed that the increased expression 
of the synthetic gene was still observed when the endogenous 
gene product and the synthetic gene product were expressed 
from vaccinia virus recombinants under the control of the 
strong mixed early and late 7.5k promoter. Because vaccinia 
virus mRNAs are transcribed and translated in the cytoplasm, 
increased expression of the synthetic envelope gene in this 
experiment cannot be attributed to improved export from the 
nucleus. This experiment was repeated in two additional 
human cell types, the kidney cancer cell line 293 and HeLa 
cells. As with transfected 293T cells, mRNA levels were 
similar in 293 cells infected with either recombinant 
vaccinia virus. 
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Codon Usage in Lentivirus 

Because it appears that codon usage has a 
significant impact on expression in mammalian cells, the 
codon frequency in the envelope genes of other retroviruses 
was examined. This study found no clear pattern of codon 
preference between retroviruses in general. However, if 
viruses from the lentivirus genus, to which HIV-1 belongs 
to, were analyzed separately, codon usage bias almost 
identical to that of HIV-1 was found. A codon frequency 
table from the envelope glycoproteins of a variety of 
(predominantly type C) retroviruses excluding the 
lentiviruses was prepared, and compared a codon frequency 
table created from the envelope sequences of four 
lentiviruses not closely related to HIV-1 (caprine arthritis 
encephalitis virus, equine infectious anemia virus, feline 
immunodeficiency virus, and visna virus) (Table 2) . The 
codon usage pattern for lentiviruses is strikingly similar 
to that of HIV-1, in all cases but one, the preferred codon 
for HIV-1 is the same as the preferred codon for the other 
lentiviruses. The exception is proline, which is encoded by 
CCT in 41% of non-HIV lentiviral envelope residues, and by 
CCA in 40% of residues, a situation which clearly also 
reflects a significant preference for the triplet ending in 
A. The pattern of codon usage by the non-lent iviral 
envelope proteins does not show a similar predominance of A 
residues, and is also not as skewed toward third position C 
and G residues as is the codon usage for the highly 
expressed human genes. In general non- lent iviral 
retroviruses appear to exploit the different codons more 
equally, a pattern they share with less highly expressed 
human genes. 



TABLE 2: Cod on frequency in the envelope gene of 

lentiviruses (lenti) and non-lent iviral 
retroviruses ( other ) 

Other Lenti Other Lenti 

Ala cys 

GC C 45 13 TG 



Gin 

10 CA 

Arq 



CG 



15 

AG 



Asn 

20 AA 



25 



Leu 



c 


45 


13 


T 


26 


37 


A 


20 


46 


G 


9 


3 




1 4 


o 


T 


6 


3 


A 


16 


5 


G 


17 


3 


A 


31 


51 


G 


15 


26 


C 


49 


31 


T 


51 


69 


C 


55 


33 


T 


51 


69 



30 


CT 


c 


22 


8 






T 


14 


9 






A 


21 


16 






G 


19 


11 




TT 


A 


15 


41 


35 




G 


10 


16 




Lys 










AA 


A 


60 


63 






G 


40 


37 


40 


Pro 










cc 


C 


42 


14 






T 


30 


41 






A 


20 


40 






G 


7 


5 



GlU 

GA 



GlV 

GG 



Asp 

GA C 55 33 CA 



AT 



TC 



AG 



Thr 

AC 



Tyr 

TA 



c 


53 


21 


T 


47 


79 


A 


52 


69 


G 


48 


31 


A 


57 


68 


G 


43 


32 


c 


21 


8 


T 


13 


9 


A 


37 


56 


G 


29 


26 


C 


51 


38 


T 


49 


62 


C 


38 


16 


T 


31 


22 


A 


31 


61 


C 


38 


10 


T 


17 


16 


a 


18 


24 


G 


6 


5 


C 


13 


20 


T 


7 


25 


C 


44 


18 


T 


27 


20 


A 


19 


55 


G 


10 


8 


C 


48 


28 


T 


52 


72 



19 



Phe 

TT 



Val 



C 
T 



52 
48 



25 
75 



GT 



C 
T 
A 
G 



36 
17 
22 
25 



9 
10 
54 
27 



Codon frequency was calculated using the GCG program 
established by the University of Wisconsin Genetics Computer 
Group. Numbers represent the percentage in which a 
particular codon is used. Codon usage of non-lentiviral 
retroviruses was compiled from the envelope precursor 
sequences of bovine leukemia virus feline leukemia virus, 
human T-cell leukemia virus type I, human T-cell 
lymphotropic virus type II, the mink cell focus-forming 
isolate of murine leukemia virus (MuLV) , the Rauscher spleen 
focus-forming isolate, the 10A1 isolate, the 4070A 
amphotropic isolate and the myeloproliferative leukemia 
virus isolate, and from rat leukemia virus, simian sarcoma 
virus, simian T-cell leukemia virus, leukemogenic retrovirus 
T1223/B and gibbon ape leukemia virus. The codon frequency 
tables for the non-HIV, non-SIV lentiviruses were compiled 
from the envelope precursor sequences for caprine arthritis 
encephalitis virus, equine infectious anemia virus, feline 
immunodeficiency virus, and visna virus. 



an A, lentiviral codons adhere to the HIV pattern of strong 
CpG under representation, so that the third position for 
alanine, proline, serine and threonine triplets is rarely G. 
The retroviral envelope triplets show a similar, but less 
pronounced, under representation of CpG. The most obvious 
difference between lentiviruses and other retroviruses with 
respect to CpG prevalence lies in the usage of the CGX 
variant of arginine triplets, which is reasonably frequently 
represented among the retroviral envelope coding sequences, 
but is almost never present among the comparable lentivirus 
sequences. 



In addition to the prevalence of codons containing 



Differences in rev Dependence Between Native and Synthetic 
qpl20 

To examine whether regulation by rev is connected to 
HIV-1 codon usage, the influence of rev on the expression of 
both native and synthetic gene was investigated. Since 
regulation by rev requires the rev-binding site RRE in cis, 
constructs were made in which this binding site was cloned 
into the 3' untranslated region of both the native and the 
synthetic gene. These plasmids were co-transf ected with rev 
or a control plasmid in trans into 293T cells, and gp!20 
expression levels in supernatants were measured 
semiquantitatively by immunoprecipitation. The procedures 
used in this experiment are described in greater detail 
below. 

As shown in Figure 5 A and Figure 5B, rev up 

regulates the native gpl20 gene, but has no effect on the 

expression of the synthetic gpl20 gene. Thus, the action of 

rev is not apparent on a substrate which lacks the coding 

sequence of endogenous viral envelope sequences. 

Expression of a synthetic ratTHY-1 gene with HIV envelope 
codons 

The above-described experiment suggest that in fact 
"envelope sequences" have to be present for rev regulation. 
In order to test this hypothesis, a synthetic version of the 
gene encoding the small, typically highly expressed cell 
surface protein, ratTHY-1 antigen, was prepared. The 
synthetic version of the ratTHY-1 gene was designed to have 
a codon usage like that of HIV gpl20. In designing this 
synthetic gene AUUUA sequences, which are associated with 
mRNA instability, were avoided. In addition, two 
restriction sites were introduced to simplify manipulation 
of the resulting gene (Figure 6) . This synthetic gene with 
the HIV envelope codon usage (rTHY-lenv) was generated using 



three 150 to 170 mer oligonucleotides (Figure 7) . In 
contrast to the syngpl20mn gene, PCR products were directly 
cloned and assembled in pUC12, and subsequently cloned into 
pCDM7 . 

Expression levels of native rTHY-1 and rTHY-1 with 
the HIV envelope codons were quant itated by 
immunofluorescence of transiently transfected 293T cells. 
Figure 8 shows that the expression of the native THY-1 gene 
is almost two orders of magnitude above the background level 
of the control transfected cells (pCDM7) . In contrast, 
expression of the synthetic ratTHY-1 is substantially lower 
than that of the native gene (shown by the shift to of the 
peak towards a lower channel number) . 

To prove that no negative sequence elements 
promoting mRNA degradation were inadvertently introduced, a 
construct was generated in which the rTHY-lenv gene was 
cloned at the 3' end of the synthetic gpl20 gene (Figure 
9B) . In this experiment 293T cells were transfected with 
either the syngpl20mn gene or the syngpl20 /ratTHY-1 env 
fusion gene (syngpl20mn. rTHY-lenv) . Expression was measured 
by immunoprecipitation with CD4:IgG fusion protein and 
protein A agarose. The procedures used in this experiment 
are described in greater detail below. 

Since the synthetic gpl20 gene has an UAG stop 
codon, rTHY-lenv is not translated from this transcript. If 
negative elements conferring enhanced degradation were 
present in the sequence , gpl20 protein levels expressed from 
this construct should be decreased in comparison to the 
syngpl20mn construct without rTHY-lenv. Figure 9A, shows 
that the expression of both constructs is similar, 
indicating that the low expression must be linked to 
translation. 



Rev-dependent expression of synthetic ratTHY-1 gene 
with envelope codons 

To explore whether rev is able to regulate 
expression of a ratTHY-1 gene having env codons, a construct 
was made with a rev-binding site in the 3' end of the 
rTHYlenv open reading frame. To measure rev-responsiveness 
of the a ratTHY-lenv construct having a 3' RRE, human 293T 
cells were cotransf ected ratTHY-lenvrre and either CDM7 or 
pCMVrev. At 60 hours post transfection cells were detached 
with 1 mM EDTA in PBS and stained with the OX-7 anti rTHY-1 
mouse monoclonal antibody and a secondary FITC-conjugated 
antibody. Fluorescence intensity was measured using a EPICS 
XL cytof luorometer. These procedures are described in 
greater detail below. 

In repeated experiments, a slight increase of rTHY- 
lenv expression was detected if rev was cotransf ected with 
the rTHY-lenv gene. To further increase the sensitivity of 
the assay system a construct expressing a secreted version 
of rTHY-lenv was generated. This construct should produce 
more reliable data because the accumulated amount of 
secreted protein in the supernatant reflects the result of 
protein production over an extended period, in contrast to 
surface expressed protein, which appears to more closely 
reflect the current production rate. A gene capable of 
expressing a secreted form was prepared by PGR using forward 
and reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The PGR 
product was cloned into a plasmid which already contained a 
CDS leader sequence, thus generating a construct in which 
the membrane anchor has been deleted and the leader sequence 
exchanged by a heterologous (and probably more efficient) 
leader peptide. 
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The rev-responsiveness of the secreted form 
ratTHY-lenv was measured by immunoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid expressing a secreted form of ratTHY-lenv and the 
5 RRE sequence in cis (rTHY-lenvPI-rre) and either CDM7 or 
pCMVrev. The rTHY-lenvPI-RRE construct was made by PCR 
using the oligonucleotide: cgcggggctagcgcaaagagtaataagtttaac 
(SEQ ID NO: 38) as a forward primer, the oligonucleotide: 
cgcggatcccttgtattttgtactaata (SEQ ID NO: 39) as reverse 
10 primer, and the synthetic rTHY-lenv construct as a template. 
After digestion with Nhel and Notl the PCR fragment was 
cloned into a plasmid containing CDS leader and RRE 
sequences. Supernatants of 35 S labeled cells were 
harvested 72 hours post transf ection, precipitated with a 
CO 15 mouse monoclonal antibody 0X7 against rTHY-1 and anti mouse 
H f IgG sepharose, and run on a 12% reducing SDS-PAGE. 

Hi In this experiment the induction of rTHY-lenv by rev 

^ was much more prominent and clear-cut than in the above- 

5 described experiment and strongly suggests that rev is able 

* 20 to translationally regulate transcripts that are suppressed 
]S by low-usage codons. 

flj Rev- independent expression of a rTHY-lenv: immunoglobulin 

O fusion protein 

To test whether low-usage codons must be present 

25 throughout the whole coding sequence or whether a short 

region is sufficient to confer rev-responsiveness, a 

rTHY-lenv: immunoglobulin fusion protein was generated. In 

this construct the rTHY-lenv gene (without the sequence 

motif responsible for phosphatidylinositol glycan anchorage) 

30 is linked to the human IgGl hinge, CH2 and CH3 domains. 

This construct was generated by anchor PCR using primers 

with Nhel and BamHI restriction sites and rTHY-lenv as 

template. The PCR fragment was cloned into a plasmid 
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containing the leader sequence of the CD5 surface molecule 
and the hinge, CH2 and CH3 parts of human IgGl 
immunoglobulin. A Hind3/Eagl fragment containing the 
rTHY-lenvegl insert was subsequently cloned into a pCDM7- 
derived plasmid with the RRE sequence. 

To measure the response of the rTHY-lenv/ 
immunoglobin fusion gene (rTHY-lenveglrre) to rev human 293T 
cells cotransfected with rTHY-lenveglrre and either pCDM7 or 
pCMVrev. The rTHY-lenveglrre construct was made by anchor 
PCR using forward and reverse primers with Nhel and BamHl 
restriction sites respectively. The PCR fragment was cloned 
into a plasmid containing a CD 5 leader and human IgGl 
hinge, CH2 and CH3 domains. Supernatants of 35 S labeled 
cells were harvested 72 hours post transf ection, 
precipitated with a mouse monoclonal antibody 0X7 against 
rTHY-1 and anti mouse IgG sepharose, and run on a 12% 
reducing SDS-PAGE. The procedures used are described in 
greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/ immunoglobulin fusion protein is secreted into the 
supernatant. Thus, this gene should be responsive to rev- 
induction. However, in contrast to rTHY-lenvPI-, 
cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1: immunoglobulin fusion 
protein with native rTHY-1 or HIV envelope codons was 
measured by immunoprecipitation. Briefly, human 293T cells 
transfected with either rTHY-lenvegl (env codons) or 
rTHY-lwtegl (native codons) . The rTHY-lwtegl construct was 
generated in manner similar to that used for the 
rTHY-lenvegl construct, with the exception that a plasmid 
containing the native rTHY-1 gene was used as template. 
Supernatants of 35 S labeled cells were harvested 72 hours 



post transf ection, precipitated with a mouse monoclonal 
antibody 0X7 against rTHY-1 and anti mouse IgG sepharose, 
and run on a 12% reducing SDS-PAGE. THE procedures used in 
this experiment are described in greater detail below . 

Expression levels of rTHY-lenvegl were decreased in 
comparison to a similar construct with wild-type rTHY-1 as 
the fusion partner, but were still considerably higher than 
rTHY-lenv. Accordingly, both parts of the fusion protein 
influenced expression levels. The addition of rTHY-lenv did 
not restrict expression to an equal level as seen for 
rTHY-lenv alone. Thus, regulation by rev appears to be 
ineffective if protein expression is not almost completely 
suppressed. 

Codon preference in HIV-1 envelope genes 

Direct comparison between codon usage frequency of 
HIV envelope and highly expressed human genes reveals a 
striking difference for all twenty amino acids. One simple 
measure of the statistical significance of this codon 
preference is the finding that among the nine amino acids 
with two fold codon degeneracy, the favored third residue is 
A or U in all nine. The probability that all nine of two 
equiprobable choices will be the same is approximately 
0.004, and hence by any conventional measure the third 
residue choice cannot be considered random. Further 
evidence of a skewed codon preference is found among the 
more degenerate codons, where a strong selection for 
triplets bearing adenine can be seen. This contrasts with 
the pattern for highly expressed genes, which favor codons 
bearing C, or less commonly G, in the third position of 
codons with three or more fold degeneracy. 

The systematic exchange of native codons with codons 
of highly expressed human genes dramatically increased 
expression of gp!20. A quantitative analysis by ELISA 
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showed that expression of the synthetic gene was at least 25 
fold higher in comparison to native gpl20 after transient 
transfection into human 293 cells. The concentration levels 
in the ELISA experiment shown were rather low. Since an 
5 ELISA was used for quantification which is based on gpl20 
binding to CD4, only native, non-denatured material was 
detected. This may explain the apparent low expression. 
Measurement of cytoplasmic mRNA levels demonstrated that the 
difference in protein expression is due to translational 

10 differences and not mRNA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
family was divided into two subgroups, lentiviruses and non- 
lentiviral retroviruses, a similar preference to A and, less 

15 frequently, T, was detected at the third codon position for 
lentiviruses. Thus, the availing evidence suggests that 
lentiviruses retain a characteristic pattern of envelope 
codons not because of an inherent advantage to the reverse 
transcription or replication of such residues, but rather 

20 for some reason peculiar to the physiology of that class of 
viruses. The major difference between lentiviruses and non- 
complex retroviruses are additional regulatory and non- 
essential^ accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the restriction 

25 of envelope expression might be that an important regulatory 
mechanism of one of these additional molecules is based on 
it. In fact, it is known that one of these proteins, rev, 
which most likely has homologues in all lentiviruses. Thus 
codon usage in viral mRNA is used to create a class of 

30 transcripts which is susceptible to the stimulatory action 

of rev. This hypothesis was proved using a similar strategy 
as above, but this time codon usage was changed into the 
inverse direction. Codon usage of a highly expressed 
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cellular gene was substituted with the most frequently used 
codons in the HIV envelope. As assumed, expression levels 
were considerably lower in comparison to the native 
molecule, almost two orders of magnitude when analyzed by 
5 immunofluorescence of the surface expressed molecule. If 
rev was coexpressed in trans and a RRE element was present 
in cis only a slight induction was found for the surface 
molecule. However, if THY-1 was expressed as a secreted 
molecule, the induction by rev was much more prominent, 

10 supporting the above hypothesis. This can probably be 
explained by accumulation of secreted protein in the 
supernatant, which considerably amplifies the rev effect. 
If rev only induces a minor increase for surface molecules 
in general, induction of HIV envelope by rev cannot have the 

15 purpose of an increased surface abundance, but rather of an 
increased intracellular gpl60 level. It is completely 
unclear at the moment why this should be the case. 

To test whether small subtotal elements of a gene 
are sufficient to restrict expression and render it rev- 

20 dependent rTHYlenv: immunoglobulin fusion proteins were 

generated, in which only about one third of the total gene 
had the envelope codon usage. Expression levels of this 
construct were on an intermediate level, indicating that the 
rTHY-lenv negative sequence element is not dominant over the 

25 immunoglobulin part. This fusion protein was not or only 
slightly rev-responsive, indicating that only genes almost 
completely suppressed can be rev-responsive. 

Another characteristic feature that was found in the 
codon frequency tables is a striking under representation of 

30 CpG triplets. In a comparative study of codon usage in E. 
coli, yeast, drosophila and primates it was shown that in a 
high number of analyzed primate genes the 8 least used 
codons contain all codons with the CpG dinucleotide 
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sequence. Avoidance of codons containing this dinucleotide 
motif was also found in the sequence of other retroviruses* 
It seems plausible that the reason for under representation 
of CpG-bearing triplets has something to do with avoidance 
5 of gene silencing by methylation of CpG cytosines. The 
expected number of CpG dinucleotides for HIV as a whole is 
about one fifth that expected on the basis of the base 
composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact has 

10 to be highly expressed at some point during viral 
pathogenesis . 

The results presented herein clearly indicate that 
codon preference has a severe effect on protein levels, and 
suggest that translational elongation is controlling 

15 mammalian gene expression. However, other factors may play 
a role. First, abundance of not maximally loaded mRNA's in 
eukaryotic cells indicates that initiation is rate limiting 
for translation in at least some cases, since otherwise all 
transcripts would be completely covered by ribosomes. 

20 Furthermore, if ribosome stalling and subsequent mRNA 

degradation were the mechanism, suppression by rare codons 
could most likely not be reversed by any regulatory 
mechanism like the one presented herein. One possible 
explanation for the influence of both initiation and 

25 elongation on translational activity is that the rate of 

initiation, or access to ribosomes, is controlled in part by 
cues distributed throughout the RNA, such that the 
lentiviral codons predispose the RNA to accumulate in a pool 
of poorly initiated RNAs. However, this limitation need not 

30 be kinetic; for example, the choice of codons could 

influence the probability that a given translation product, 
once initiated, is properly completed. Under this 
mechanism, abundance of less favored codons would incur a 
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significant cumulative probability of failure to complete 
the nascent polypeptide chain. The sequestered RNA would 
then be lent an improved rate of initiation by the action of 
rev. Since adenine residues are abundant in rev-responsive 
5 transcripts, it could be that RNA adenine methylation 
mediates this translational suppression. 
Detailed Procedures 

The following procedures were used in the above- 
described experiments. 
10 Sequence Analysis 

Sequence analyses employed the software developed by 
the University of Wisconsin Computer Group. 

Plasmid constructions 

Plasmid constructions employed the following 

15 methods. Vectors and insert DNA was digested at a 

concentration of 0.5 fig/ 10 fil in the appropriate restriction 
buffer for 1-4 hours (total reaction volume approximately 
30 fil) . Digested vector was treated with 10% (v/v) of 1 
fig /ml calf intestine alkaline phosphatase for 3 0 min prior 

20 to gel electrophoresis. Both vector and insert digests (5 
to 10 fil each) were run on a 1.5% low melting agarose gel 
with TAE buffer. Gel slices containing bands of interest 
were transferred into a 1.5 ml reaction tube, melted at 65 °C 
and directly added to the ligation without removal of the 

25 agarose. Ligations were typically done in a total volume of 
25 fil in lx Low Buffer lx Ligation Additions with 200-400 U 
of ligase, 1 fil of vector, and 4 fil of insert. When 
necessary , 5' overhanging ends were filled by adding 1/10 
volume of 250 fiK dNTPs and 2-5 U of Klenow polymerase to 

30 heat inactivated or phenol extracted digests and incubating 
for approximately 20 min at room temperature. When 
necessary, 3' overhanging ends were filled by adding 1/10 
volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase to 
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heat inactivated or phenol extracted digests, followed by 
incubation at 37 °C for 30 min. The following buffers were 
used in these reactions: lOx Low buffer (60 mM Tris HCl, pH 
7.5, 60 mM MgCl 2/ 50 mM NaCl, 4 mg/ml BSA, 70 mM B- 
5 mercaptoethanol, 0.02% NaN 3 ) ; lOx Medium buffer (60 mM Tris 
HCl, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM B- 
mercaptoethanol, 0.02% NaN 3 ) ; lOx High buffer (60 mM Tris 
HCl, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM B- 
mercaptoethanol, 0.02% NaN 3 ) ; lOx Ligation additions (1 mM 
10 ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM spermidine) ; 50x TAE (2 M 
Tris acetate, 50 mM EDTA) . 

Oligonucleotide synthesis and purification 
Oligonucleotides were produced on a Milligen 8750 
^ synthesizer (Millipore) . The columns were eluted with 1 ml 

ffi 15 of 30% ammonium hydroxide, and the eluted oligonucleotides 

were deblocked at 55 °C for 6 to 12 hours. After 
S_j deblockiong, 150 fil of oligonucleotide were precipitated 

W with lOx volume of unsaturated n-butanol in 1.5 ml reaction 

'% tubes, followed by centrifugation at 15,000 rpm in a 

20 microfuge. The pellet was washed with 70% ethanol and 
y resuspended in 50 fil of H 2 0. The concentration was 

fD determined by measuring the optical density at 260 nm in a 

y dilution of 1:333 (1 OD 260 = 30 /tg/ml) . 

m The following oligonucleotides were used for 

25 construction of the synthetic gpl20 gene (all sequences 
shown in this text are in 5' to 3' direction). 

oligo 1 forward (Nhel) : cgc ggg eta gec acc gag aag 

ctg (SEQ ID NO:l) . 

oligo 1: acc gag aag ctg tgg gtg acc gtg tac tac 
30 ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 
gec age gac gec aag gcg tac gac acc gag gtg cac aac gtg tgg 
gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag gag gtg 
gag etc gtg aac gtg acc gag aac ttc aac at ( SEQ ID NO : 2 ) . 
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oligo 1 reverse: cca cca tgt tgt tct tec aca tgt tga 
agt tct c (SEQ ID NO: 3). 

oligo 2 forward: gac cga gaa ctt caa cat gtg gaa 

gaa caa cat (SEQ ID NO: 4) 
5 oligo 2: tgg aag aac aac atg gtg gag cag atg cat gag 

gac ate ate age ctg tgg gac cag age ctg aag ccc tgc gtg aag 
ctg acc cc ctg tgc gtg acc tg aac tgc acc gac ctg agg aac 
acc acc aac acc aac ac age acc gee aac aac aac age aac age 
gag ggc acc ate aag ggc ggc gag atg (SEQ ID NO: 5). 
10 oligo 2 reverse (Pstl) : gtt gaa get gca gtt ctt cat 

etc gee gee ctt (SEQ ID NO: 6). 

oligo 3 forward (Pstl) : gaa gaa ctg cag ctt caa cat 
cac cac cag c (SEQ ID NO:7). 
?a - oligo 3: aac ate acc acc age ate cgc gac aag atg cag 

m is aag gag tac gee ctg ctg tac aag ctg gat ate gtg age ate gac 
?j aac gac age acc age tac cgc ctg ate tec tgc aac acc age gtg 

lj ate acc cag gee tgc ccc aag ate age ttc gag ccc ate ccc ate 

fU cac tac tgc gee ccc gee ggc ttc gee (SEQ ID NO: 8). 

% oligo 3 reverse: gaa ctt ctt gtc ggc ggc gaa gee 

s' 20 ggc ggg (SEQ ID NO:9). 

y oligo 4 forward: gcg ccc ccg ccg get teg cca tec 

ffj tga agt gca acg aca aga agt tc (SEQ ID NO: 10) 

O oligo 4: gee gac aag aag ttc age ggc aag ggc age 

"m tgc aag aac gtg age acc gtg cag tgc acc cac ggc ate egg ccg 

25 gtg gtg age acc cag etc ctg ctg aac ggc age ctg gee gag gag 
gag gtg gtg ate cgc age gag aac ttc acc gac aac gee aag acc 
ate ate gtg cac ctg aat gag age gtg cag ate (SEQ ID NO: 11) 

oligo 4 reverse (Mlul) : agt tgg gac gcg tgc agt tga 
tct gca cgc tct c (SEQ ID NO: 12). 
30 oligo 5 forward (Mlul) : gag age gtg cag ate aac tgc 

acg cgt ccc (SEQ ID NO: 13). 

oligo 5: aac tgc acg cgt ccc aac tac aac aag cgc 
aag cgc ate cac ate ggc ccc ggg cgc gec ttc tac acc acc aag 
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aac ate ate ggc ace ate etc cag gee cac tgc aac ate tct aga 
(SEQ ID NO: 14) . 

oligo 5 reverse: gtc gtt cca ctt ggc tct aga gat 

gtt gca (SEQ ID NO: 15). 

oligo 6 forward: gca aca tct eta gag cca agt gga 

acg ac (SEQ ID N0:16). 

oligo 6: gec aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag cag ttc aag aac aag acc ate gtg ttc 
ac cag age age ggc ggc gac ccc gag ate gtg atg cac age ttc 
aac tgc ggc ggc (SEQ ID NO: 17). 

oligo 6 reverse (EcoRl) : gca gta gaa gaa ttc gee gee 
gca gtt ga (SEQ ID NO: 18). 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 
tct tct act gc (SEQ ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
ctg ttc aac age acc tgg aac ggc aac aac acc tgg aac aac acc 
acc ggc age aac aac aat att acc etc cag tgc aag ate aag cag 
ate ate aac atg tgg cag gag gtg ggc aag gee atg tac gee ccc 
ccc ate gag ggc cag ate egg tgc age age (SEQ ID NO: 20) 

oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 
acc gga tct ggc cct c (SEQ ID NO: 21). 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag 
caa cat cac egg tct g (SEQ ID NO: 22). 

oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac 
ggc ggc aag gac acc gac acc aac gac acc gaa ate ttc cgc ccc 
ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg tac aag 
tac aag gtg gtg acg ate gag ccc ctg ggc gtg gee ccc acc aag 
gee aag cgc cgc gtg gtg cag cgc gag aag cgc (SEQ ID NO: 23). 

oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag cgc 
ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 



oligo 1 forward (BamHl/Hind3) : cgc ggg gga tec aag 
ctt acc atg att cca gta ata agt (SEQ ID NO: 2 5). 

oligo 1: atg aat cca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt tta 
aca gca tct tta gta aat caa aat ttg aga tta gat tgt aga cat 
gaa aat aat aca aat ttg cca ata caa cat gaa ttt tea tta acg 

(SEQ ID NO: 26) . 

oligo 1 reverse (EcoRl/Mlul) : cgc ggg gaa ttc acg 
cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27). 

oligo 2 forward (BamHl/Mlul) : cgc gga tec acg cgt 
gaa aaa aaa aaa cat (SEQ ID NO: 28). 

oligo 2: cgt gaa aaa aaa aaa cat gta tta agt gga 
aca tta gga gta cca gaa cat aca tat aga agt aga gta aat ttg 
ttt agt gat aga ttc ata aaa gta tta aca tta gca aat ttt aca 
aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID NO: 29). 

oligo 2 reverse (EcoRl/Sacl) : cgc gaa ttc gag etc 
aca cat ata ate tec (SEQ ID NO: 30). 

oligo 3 forward (BamHl/Sacl) : cgc gga tec gag etc 
aga gta agt gga caa (SEQ ID NO: 31). 

oligo 3: etc aga gta agt gga caa aat cca aca agt 
agt aat aaa aca ata aat gta ata aga gat aaa tta gta aaa tgt 
ga gga ata agt tta tta gta caa aat aca agt tgg tta tta tta 
tta tta tta agt tta agt ttt tta caa gca aca gat ttt ata agt 

tta tga (SEQ ID NO:32). 

oligo 3 reverse (EcoRl/Notl) : cgc gaa ttc gcg gee 
get tea taa act tat aaa ate (SEQ ID NO: 33). 

Polymerase Chain Reaction 

Short, overlapping 15 to 25 mer oligonucleotides 
annealing at both ends were used to amplify the long 
oligonuclotides by polymerase chain reaction (PCR) . Typical 
PCR conditions were: 35 cycles, 55 °C annealing temperature, 
0.2 sec extension time. PCR products were gel purified, 
phenol extracted, and used in a subsequent PCR to generate 



longer fragments consisting of two adjacent small fragments. 
These longer fragments were cloned into a CDM7 -derived 
plasmid containing a leader sequence of the CD 5 surface 
molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamHl 

poly linker. 

The following solutions were used in these 
reactions: lOx PCR buffer (500 mM KCl, 100 mM Tris HC1, pH 
7.5, 8 mM MgCl 2 , 2 mM each dNTP) . The final buffer was 
complemented with 10% DMS0 to increase fidelity of the Taq 
polymerase. 

Small scale DNA preparation 

Transformed bacteria were grown in 3 ml LB cultures 
for more than 6 hours or overnight. Approximately 1.5 ml of 
each culture was poured into 1.5 ml microfuge tubes, spun 
for 20 seconds to pellet cells and resuspended in 200 jLtl of 
solution I. Subsequently 400 /*1 of solution II and 300 /tl 
of solution III were added. The microfuge tubes were capped, 
mixed and spun for > 30 sec. Supernatants were transferred 
into fresh tubes and phenol extracted once. DNA was 
precipitated by filling the tubes with isopropanol, mixing, 
and spinning in a microfuge for > 2 min. The pellets were 
rinsed in 70 % ethanol and resuspended in 50 jlcI dH20 
containing 10 fil of RNAse A. The following media and 
solutions were used in these procedures: LB medium (1.0 % 
NaCl, 0.5% yeast extract, 1.0% trypton) ; solution I (10 mM 
EDTA pH 8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution 
III (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 
adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HCl, pH 
7.5, 1 mM EDTA pH 8.0) . 

Large scale DNA preparation 

One liter cultures of transformed bacteria were 
grown 24 to 36 hours (MC1061p3 transformed with pCDM 
derivatives) or 12 to 16 hours (MC1061 transformed with pUC 



derivatives) at 37 °C in either M9 bacterial medium (pCDM 
derivatives) or LB (pUC derivatives) . Bacteria were spun 
down in 1 liter bottles using a Beckman J6 centrifuge at 
4,200 rpm for 20 min. The pellet was resuspended in 40 ml 
5 of solution I. Subsequently, 80 ml of solution II and 40 ml 
of solution III were added and the bottles were shaken 
semivigorously until lumps of 2 to 3 mm size developed. The 
bottle was spun at 4,200 rpm for 5 min and the supernatant 
was poured through cheesecloth into a 250 ml bottle* 
10 Isopropanol was added to the top and the bottle was 

spun at 4,200 rpm for 10 min. The pellet was resuspended in 
4.1 ml of solution I and added to 4.5 g of cesium chloride, 
0.3 ml of 10 mg/ml ethidium bromide, and 0.1 ml of 1% Triton 
f - ; X100 solution. The tubes were spun in a Beckman J2 high 

08 15 speed centrifuge at 10,000 rpm for 5 min. The supernatant 
/f was transferred into Beckman Quick Seal ultracentrifuge 

%j tubes, which were then sealed and spun in a Beckman 

ultracentrifuge using a NVT90 fixed angle rotor at 80,000 
5 rpm for > 2.5 hours. The band was extracted by visible 

20 light using a 1 ml syringe and 20 gauge needle. An equal 
% volume of dH 2 0 was added to the extracted material. DNA was 

flj extracted once with n-butanol saturated with 1 M sodium 

^ chloride, followed by addition of an equal volume of 10 M 

m ammonium acetate/ 1 mM EDTA. The material was poured into a 

25 13 ml snap tube which was tehn filled to the top with 

absolute ethanol, mixed, and spun in a Beckman J2 centrifuge 
at 10,000 rpm for 10 min. The pellet was rinsed with 70% 
ethanol and resuspended in 0.5 to 1 ml of H 2 0. The DNA 
concentration was determined by measuring the optical 
30 density at 260 nm in a dilution of 1:200 (1 OD 260 = 50 
/tg/ml) • 

The following media and buffers were used in these 
procedures: M9 bacterial medium (10 g M9 salts, 10 g 

- 36 - 



t. 



* 

1 



casamino acids ( hydro ly z ed ) , 10 ml M9 additions, 7.5 /tg/ml 
tetracycline (500 fil of a 15 mg/ml stock solution), 12.5 
/ig/ml ampicillin (125 fil of a 10 mg/ml stock solution) ; M9 
additions (10 mM CaCl 2 , 100 mM MgS0 4 , 200 /tg/ml thiamine, 
5 70% glycerol); LB medium (1.0 % NaCl, 0.5 % yeast extract, 
1.0 % trypton) ; Solution I (10 mM EDTA pH 8.0); Solution II 
(0.2 M NaOH 1.0 % SDS) ; Solution III (2.5 M KOAc 2.5 M HOAc) 
Sequencing 

Synthetic genes were sequenced by the Sanger 
10 dideoxynucleotide method. In brief, 20 to 50 fxq double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 min. 
Subsequently the DNA was precipitated with 1/10 volume of 
sodium acetate (pH 5.2) and 2 volumes of ethanol and 
™ centrifuged for 5 min. The pellet was washed with 70% 

CB 15 ethanol and resuspended at a concentration of 1 [iqffil. The 

annealing reaction was carried out with 4 [xg of template DNA 
%j and 40 ng of primer in Ix annealing buffer in a final volume 

W of 10 jtl. The reaction was heated to 65 °C and slowly cooled 

1? to 37°C. 

3 20 In a separate tube 1 fil of 0.1 M DTT, 2 fil of 

% labeling mix, 0.75 fil of dH 2 0, 1 ju,l of [ 35 S] dATP (10 /zCi) , 

pj and 0.25 fil of Sequenase™ (12 U//tl) were added for each 

U reaction. Five fil of this mix were added to each annealed 

m primer-template tube and incubated for 5 min at room 

25 temperature. For each labeling reaction 2.5 jliI of each of 
the 4 termination mixes were added on a Terasaki plate and 
prewarmed at 37 °C. At the end of the incubation period 3.5 
fil of labeling reaction were added to each of the 4 
termination mixes. After 5 min, 4 fil of stop solution were 
30 added to each reaction and the Terasaki plate was incubated 
at 80°C for 10 min in an oven. The sequencing reactions 
were run on 5% denaturing polyacrylamide gel. An acrylamide 
solution was prepared by adding 200 ml of lOx TBE buffer and 
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957 ml of dH 2 0 to 100 g of acrylamiderbisacrylamide (29:1). 
5% polyacrylamide 46% urea and lx TBE gel was prepared by 
combining 38 ml of acrylamide solution and 28 g urea. 
Polymerization was initiated by the addition of 400 ftl of 
5 10% ammonium peroxodisulfate and 60 jul of TEMED. Gels were 
poured using silanized glass plates and sharktooth combs and 
run in lx TBE buffer at 60 to 100 W for 2 to 4 hours 
(depending on the region to be read) . Gels were transferred 
to Whatman blotting paper, dried at 80 °C for about 1 hour, 
10 and exposed to x-ray film at room temperature. Typically 
exposure time was 12 hours. The following solutions were 
used in these procedures: 5x Annealing buffer (200 mM Tris 
HC1, pH 7.5, 100 mM MgCl 2 , 250 mM NaCl) ; Labelling Mix (7.5 
jjM each dCTP, dGTP, and dTTP) ; Termination Mixes (80 ju,M each 
03 15 dNTP, 50 mM NaCl, 8 {jlM ddNTP (one each)); Stop solution (95% 
y formamide, 20 mM EDTA, 0.05 % bromphenol blue, 0.05 % 

lj xylencyanol) ; 5x TBE (0.9 M Tris borate, 20 mM EDTA) ; 

m Polyacrylamide solution (96.7 g polyacrylamide, 3.3 g 

5 bisacrylamide, 200 ml lx TBE, 957 ml dH 2 0) . 

20 RNA isolation 

^ Cytoplasmic RNA was isolated from calcium phosphate 

[If transfected 293T cells 36 hours post transfection and from 

y vaccinia infected Hela cells 16 hours post infection 

m essentially as described by Gilman. (Gilman Preparation of 

25 cytoplasmic RNA from tissue culture cells. In Current 

Protocols in Molecular Biology , Ausubel et al., eds., Wiley 
& Sons, New York, 1992) . Briefly, cells were lysed in 400 
pi lysis buffer, nuclei were spun out, and SDS and 
proteinase K were added to 0.2% and 0.2 mg/ml respectively. 
30 The cytoplasmic extracts were incubated at 37 °C for 20 min, 
phenol /chloroform extracted twice, and precipitated. The 
RNA was dissolved in 100 fil buffer I and incubated at 37 °C 
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for 20 min. The reaction was stopped by adding 25 fil stop 
buffer and precipitated again. 

The following solutions were used in this procedure: 
Lysis Buffer (TRUSTEE containing with 50 mM Tris pH 8.0, 100 
5 mM NaCl, 5 mM MgCl 2 , 0.5% NP40) ; Buffer I (TRUSTEE buffer 
with 10 mM MgCl 2/ 1 mM DTT, 0.5 U/jLil placental RNAse 
inhibitor, 0.1 V/fil RNAse free DNAse I); Stop buffer (50 mM 
EDTA 1.5 M NaOAc 1.0% SDS) . 
Slot blot analysis 
10 For slot blot analysis 10 jag of cytoplasmic RNA was 

dissolved in 50 pi dH 2 0 to which 150 fil of lOx SSC/18% 
formaldehyde were added. The solubilized RNA was then 
incubated at 65 °C for 15 min and spotted onto with a slot 
blot apparatus. Radioactively labeled probes of 1.5 kb 
SO 15 gpl20IIIb and syngpl20mn fragments were used for 
~/f hybridization. Each of the two fragments was random labeled 

in a 50 fxl reaction with 10 pi of 5x oligo-labeling buffer, 
W 8 /tl of 2.5 mg/ml BSA, 4 [Jbl of [oc 32 P] -dCTP (20 uCi//*l; 6000 

jj Ci/mmol) , and 5 U of Klenow fragment. After 1 to 3 hours 

s 20 incubation at 37 °C 100 jttl of TRUSTEE were added and 
% unincorporated [« 32 P]-dCTP was eliminated using 650 'spin 

fli column. Activity was measured in a Beckman beta-counter, 

5 and equal specific activities were used for hybridization. 

* -J 'f* 

m Membranes were pre-hybridized for 2 hours and hybridized for 

25 12 to 24 hours at 42 °C with 0.5 x 10 6 cpm probe per ml 

hybridization fluid. The membrane was washed twice (5 min) 
with washing buffer I at room temperature, for one hour in 
washing buffer II at 65°C, and then exposed to x-ray film. 
Similar results were obtained using a 1.1 kb Notl/Sfil 
30 fragment of pCDM7 containing the 3 untranslated region. 

Control hybridizations were done in parallel with a random- 
labeled human beta-actin probe. RNA expression was 
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quantitated by scanning the hybridized nitrocellulose 
membranes with a Magnetic Dynamics phosphor imager . 

The following solutions were used in this procedure: 
5x Oligo-labeling buffer (250 mM Tris HC1, pH 8.0, 25 mM 
5 MgCl 2/ 5 mM S-mercaptoethanol , 2 mM dATP, 2 mM dGTP, mM 
dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides [dNTP]6); 
Hybridization Solution (.05 M sodium phosphate, 250 mM NaCl, 
7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% formamide, 100 
jLtg/ml denatured salmon sperm DNA) ; Washing buffer I (2x SSC, 
10 0*1% SDS); Washing buffer II (0.5x SSC, 0*1 % SDS); 20x SSC 
(3 M NaCl, 0.3 M Na 3 citrate, pH adjusted to 7.0). 

Vaccinia recombination 

Vaccinia recombination used a modification of the of 
the method described by Romeo and Seed (Romeo and Seed, 
h 15 Cell , 64: 1037, 1991). Briefly, CV1 cells at 70 to 90% 
Pf conf luency were infected with 1 to 3 pi of a wild-type 

: ,j vaccinia stock WR (2 x 10 8 pfu/ml) for 1 hour in culture 

W medium without calf serum. After 24 hours, the cells were 

5 transfected by calcium phosphate with 25 fig TKG plasmid DNA 

* 20 per dish. After an additional 24 to 48 hours the cells were 

scraped off the plate, spun down, and resuspended in a 
fij volume of 1 ml. After 3 freeze/thaw cycles trypsin was 

added to 0.05 mg/ml and lysates were incubated for 20 min. 

»- V pi 

m A dilution series of 10, 1 and 0.1 fil of this lysate was 

25 used to infect small dishes (6 cm) of CV1 cells, that had 
been pretreated with 12.5 ju,g/ml mycophenolic acid, 0.25 
mg/ml xanthin and 1.36 mg/ml hypoxanthine for 6 hours. 
Infected cells were cultured for 2 to 3 days, and 
subsequently stained with the monoclonal antibody NEA9301 
30 against gpl20 and an alkaline phosphatase conjugated 

secondary antibody. Cells were incubated with 0.33 mg/ml 
NBT and 0.16 mg/ml BCIP in AP-buffer and finally overlaid 
with 1% agarose in PBS. Positive plaques were picked and 
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resuspendeci in 100 fil Tris pH 9.0. The plaque purification 
was repeated once. To produce high titer stocks the 
infection was slowly scaled up. Finally, one large plate of 
Hela cells was infected with half of the virus of the 
previous round. Infected cells were detached in 3 ml of 
PBS, lysed with a Dounce homogenizer and cleared from larger 
debris by centrifugation* VPE-8 recombinant vaccinia stocks 
were kindly provided by the AIDS repository, Rockville, MD, 
and express HIV-1 IIIB gpl20 under the 7.5 mixed early/ late 
promoter (Earl et al., J. Virol . , 65:31, 1991). In all 
experiments with recombinant vaccina cells were infected at 
a multiplicity of infection of at least 10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HC1, pH 9.5, 100 mM NaCl, 5 mM MgCl 2 ) 

Cell culture 

The monkey kidney carcinoma cell lines CV1 and Cos7, 
the human kidney carcinoma cell line 293T, and the human 
cervix carcinoma cell line Hela were obtained from the 
American Tissue Typing Collection and were maintained in 
supplemented IMDM. They were kept on 10 cm tissue culture 
plates and typically split 1:5 to 1:20 every 3 to 4 days. 
The following medium was used in this procedure: 
Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 min 
56 °C, 0.3 mg/ml L-glutamine, 25 /tg/ml gent amy cin 0-5 mM 6- 
mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 ml)). 

Transf ection 

Calcium phosphate transfection of 293T cells was 
performed by slowly adding and under vortexing 10 jug plasmid 
DNA in 250 p~L 0.25 M CaCl 2 to the same volume of 2x HEBS 
buffer while vortexing. After incubation for 10 to 30 min 
at room temperature the DNA precipitate was added to a small 
dish of 50 to 70% confluent cells. In cotransf ection 



1 

r 



experiments with rev, cells were transfected with 10 /xg 
gpl20IIIb, gpl20IIIbrre / syngpl20mnrre or rTHY-lenveglrre 
and 10 jug of pCMVrev or CDM7 plasmid DNA. 

The following solutions were used in this procedure: 
5 2x HEBS buffer (280 mM NaCl, 10 mM KC1, 1.5 * sterile 
filtered); 0.25 mM CaCl 2 (autoclaved) . 

Immunoprecipitation 

After 48 to 60 hours medium was exchanged and cells 
were incubated for additional 12 hours in Cys/Met-free 
10 medium containing 200 juCi of 35 S-translabel. Supernatants 
were harvested and spun for 15 min at 3 000 rpm to remove 
debris. After addition of protease inhibitors leupeptin, 
aprotinin and PMSF to 2.5 tig/wl, 50 tig/ml, 100 vg/ml 
respectively, 1 ml of supernatant was incubated with either 
W 15 10 /zl of packed protein A sepharose alone (rTHY-lenveglrre) 
ii or with protein A sepharose and 3 fig of a purified 

SJ CD4/ immunoglobulin fusion protein (kindly provided by 

% Behring) (all gpl20 constructs) at 4°C for 12 hours on a 

j; rotator. Subsequently the protein A beads were washed 5 

I 20 times for 5 to 15 min each time. After the final wash 10 fxl 

ST"'**"?? 

% of loading buffer containing was added, samples were boiled 

flj for 3 min and applied on 7% (all gpl20 constructs) or 10% 

it* J-;;;' 

% (rTHY-lenveglrre) SDS polyacrylamide gels (TRIS pH 8.8 

ft" U -i 

m buffer in the resolving, TRIS pH 6.8 buffer in the stacking 

25 gel, TRIS-glycin running buffer, Maniatis et al., supra 
1989) . Gels were fixed in 10% acetic acid and 10 % 
methanol, incubated with Amplify for 20 min, dried and 
exposed for 12 hours. 

The following buffers and solutions were used in 
3 0 this procedure: Wash buffer (100 mM Tris, pH 7.5, 150 mM 

NaCl, 5 mM CaCl 2 , 1% NP-40) ; 5x Running Buffer (125 mM Tris, 
1.25 M Glycin, 0.5% SDS); Loading buffer (10 % glycerol, 4% 
SDS, 4% 6-mercaptoethanol , 0.02 % bromphenol blue). 
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Immunof luor escence 

293T cells were transfected by calcium phosphate 
coprecipitation and analyzed for surface THY-1 expression 
after 3 days. After detachment with 1 mM EDTA/PBS, cells 
were stained with the monoclonal antibody OX- 7 in a dilution 
of 1:250 at 4°C for 20 min, washed with PBS and subsequently 
incubated with a 1:500 dilution of a FITC-conjugated goat 
anti-mouse immunoglobulin antiserum. Cells were washed 
again, resuspended in 0.5 ml of a fixing solution, and 
analyzed on a EPICS XL cytof luorometer (Coulter) . 

The following solutions were used in this procedure: 
PBS (137 mM NaCl, 2 . 7 mM KCl, 4 . 3 mM Na 2 HP0 4 , 1.4 mM KH 2 P0 4/ 
pH adjusted to 7.4); Fixing solution (2% formaldehyde in 
PBS) . 

ELISA 

The concentration of gpl20 in culture supernatants 
was determined using CD4-coated ELISA plates and goat anti- 
gpl20 antisera in the soluble phase. Supernatants of 293T 
cells transfected by calcium phosphate were harvested after 
4 days, spun at 3000 rpm for 10 min to remove debris and 
incubated for 12 hours at 4°C on the plates. After 6 washes 
with PBS 100 /xl of goat anti-gpl20 antisera diluted 1:200 
were added for 2 hours. The plates were washed again and 
incubated for 2 hours with a peroxidase-conjugated rabbit 
anti-goat IgG antiserum 1:1000. Subsequently the plates 
were washed and incubated for 30 min with 100 fj,l of 
substrate solution containing 2 mg/ml o-phenylenediamine in 
sodium citrate buffer. The reaction was finally stopped 
with 100 jul of 4 M sulfuric acid. Plates were read at 490 
nm with a Coulter microplate reader. Purified recombinant 
gpl20IIIb was used as a control. The following buffers and 
solutions were used in this procedure: Wash buffer (0.1% 



NP40 in PBS); Substrate solution (2 mg/ml o-phenylenediamine 
in sodium citrate buffer) . 
EXAMPLE 2 

A Synthetic Green Fluorescent Protein Gene 

The efficacy of codon replacement for gpl20 suggests 
that replacing non-preferred codons with less preferred 
codons or preferred codons (and replacing less preferred 
codons with preferred codons) will increase expression in 
mammalian cells of other proteins, e.g., other eukaryotic 
proteins . 

The green fluorescent protein (GFP) of the jellyfish 
Aequorea victoria (Ward, Photochem . Photobiol . 4:1, 1979; 
Prasher et al., Gene 111:229, 1992; Cody et al., Biochem. 
32:1212, 1993) has attracted attention recently for its 
possible utility as a marker or reporter for transfection 
and lineage studies (Chalfie et al., Science 263:802, 1994). 

Examination of a codon usage table constructed from 
the native coding sequence of GFP showed that the GFP codons 
favored either A or U in the third position. The bias in 
this case favors A less than does the bias of gpl20, but is 
substantial. A synthetic gene was created in which the 
natural GFP sequence was re-engineered in much the same 
manner as for gpl20 (FIG. 11; SEQ ID N0:40). In addition, 
the translation initiation sequence of GFP was replaced with 
sequences corresponding to the translational initiation 
consensus. The expression of the resulting protein was 
contrasted with that of the wild type sequence, similarly 
engineered to bear an optimized translational initiation 
consensus (FIG. 10B and FIG. 10C) . In addition, the effect 
of inclusion of the mutation Ser 65-»Thr, reported to improve 
excitation efficiency of GFP at 490 nm and hence preferred 
for fluorescence microscopy (Heim et al., Nature 373:663, 
1995) , was examined (FIG. 10D) . Codon engineering conferred 
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a significant increase in expression efficiency (an 
concomitant percentage of cells apparently positive for 
transf ection) , and the combination of the Ser 65->Thr 
mutation and codon optimization resulted in a DNA segment 
encoding a highly visible mammalian marker protein (FIG* 
10D) . 

The above-described synthetic green fluorescent 
protein coding sequence was assembled in a similar manner as 
for gpl20 from six fragments of approximately 120 bp each, 
using a strategy for assembly that relied on the ability of 
the restriction enzymes Bsal and Bbsl to cleave outside of 
their recognition sequence. Long oligonucleotides were 
synthesized which contained portions of the coding sequence 
for GFP embedded in flanking sequences encoding EcoRI and 
Bsal at one end, and BamHI and Bbsl at the other end. Thus, 
each oligonucleotide has the configuration EcoRI /Bsal /GFP 
fragment /Bbsl /BamHI. The restriction site ends generated by 
the Bsal and Bbsl sites were designed to yield compatible 
ends that could be used to join adjacent GFP fragments. 
Each of the compatible ends were designed to be unique and 
non-self complementary . The crude synthetic DNA segments 
were amplified by PCR, inserted between EcoRI and BamHI in 
pUC9, and sequenced. Subsequently the intact coding 
sequence was assembled in a six fragment ligation, using 
insert fragments prepared with Bsal and Bbsl. Two of six 
plasmids resulting from the ligation bore an insert of 
correct size, and one contained the desired full length 
sequence. Mutation of Ser65 to Thr was accomplished by 
standard PCR based mutagenesis, using a primer that 
overlapped a unique BssSI site in the synthetic GFP. 
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Codon optimization as a strategy for impr oved expression in 
mammalian cells 

The data presented here suggest that coding sequence 
re-engineering may have general utility for the improvement 
5 of expression of mammalian and non-mammalian eukaryotic 
genes in mammalian cells. The results obtained here with 
three unrelated proteins: HIV gpl20, the rat cell surface 
antigen Thy-1 and green fluorescent protein from Aequorea 
victoria, and human Factor VIII (see below) suggest that 
10 codon optimization may prove to be a fruitful strategy for 
improving the expression in mammalian cells of a wide 
variety of eukaryotic genes. 

EXAMPLE III 

^ Design of a Codon-Optimized Gene Expressing Hum an Factor 

5 15 VIII Lacking the Central B Domain 

A synthetic gene was designed that encodes mature 
\J human Factor VIII lacking amino acid residues 760 to 1639, 

r y inclusive (residues 779 to 1658, inclusive, of the 

5 precursor) . The synthetic gene was created by choosing 

20 codons corresponding to those favored by highly expressed 
"S human genes. Some deviation from strict adherence to the 

fy favored residue pattern was made to allow unique restriction 

^ enzyme cleavage sites to be introduced throughout the gene 

OH to facilitate future manipulations. For preparation of the 

25 synthetic gene the sequence was then divided into 28 
segments of 150 basepairs, and a 29th segment of 161 
basepairs. 

The a synthetic gene expressing human Factor VIII 
lacking the central B domain was constructed as follows. 
30 Twenty-nine pairs of template oligonucleotides (see below) 
were synthesized. The 5' template oligos were 105 bases 
long and the 3' oligos were 104 bases long (except for the 
last 3' oligo, which was 125 residues long). The template 
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oligos were designed so that each annealing pair composed of 
one 5' oligo and one 3' oligo, created a 19 basepair double- 
stranded regions. 

To facilitate the PCR and subsequent manipulations , 
5 the 5' ends of the oligo pairs were designed to be invariant 
over the first 18 residues, allowing a common pair of PCR 
primers to be used for amplification, and allowing the same 
PCR conditions to be used for all pairs. The first 18 
residues of each 5' member of the template pair were cgc gaa 
10 ttc gga aga ccc (SEQ ID NO: 110) and the first 18 residues of 
each 3' member of the template pair were: ggg gat cct cac 
gtc tea (SEQ ID NO:43). 

Pairs of oligos were annealed and then extended and 
fi amplified by PCR in a reaction mixture as follows: templates 

08 15 were annealed at 200 jtxg/ml each in PCR buffer (10 mM 

Tris-HCl, 1.5 mM MgCl 2 , 50 mM KC1, 100 jug/ml gelatin, pH 
%i 8.3). The PCR reactions contained 2 ng of the annealed 

K template oligos, 0.5 jug of each of the two 18-mer primers 

5 (described below) , 200 /xM of each of the deoxynucleoside 

20 triphosphates, 10% by volume of DMSO and PCR buffer as 
!S supplied by Boehringer Mannheim Biochemicals, in a final 

fy volume of 50 jul. After the addition of Taq polymerase (2.5 

y units, 0.5 ill j Boehringer Mannheim Biochemicals) 

01 amplif ications were conducted on a Perkin-Elmer Thermal 

25 Cycler for 25 cycles (94 °C for 30 sec, 55°C for 30 sec, and 
72 °C for 30 sec) . The final cycle was followed by a 10 
minute extension at 72 °C. 

The amplified fragments were digested with EcoRI and 
BamHI (cleaving at the 5' and 3' ends of the fragments 
30 respectively) and ligated to a pUC9 derivative cut with 
EcoRI and BamHI. 

Individual clones were sequenced and a collection of 
plasmids corresponding to the entire desired sequence was 
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identified. The clones were then assembled by multif ragment 
ligation taking advantage of restriction sites at the 3' 
ends of the PCR primers, immediately adjacent to the 
amplified sequence. The 5' PCR primer contained a Bbsl 
5 site, and the 3 7 PCR primer contained a BsmBI site, 
positioned so that cleavage by the respective enzymes 
preceded the first nucleotide of the amplified portion and 
left a 4 base 5 ' overhang created by the first 4 bases of 
the amplified portion. Simultaneous digestion with Bbsl and 
10 BsmBI thus liberated the amplified portion with unique 4 
base 5' overhangs at each end which contained none of the 
primer sequences. In general these overhangs were not self- 
complementary, allowing mult if ragment ligation reactions to 
n produce the desired product with high efficiency. The 

IB 15 unique portion of the first 28 amplified oligonucleotide 
E f pairs was thereby 154 basepairs, and after digestion each 

SI gave rise to a 150 bp fragment with unique ends. The first 

! H and last fragments were not manipulated in this manner, 

JE however, since they had other restriction sites designed 

1 20 into them to facilitate insertion of the assembled sequence 
% into an appropriate mammalian expression vector. The actual 

flj assembly process proceded as follows, 

y Assembly of the Synthetic Factor VIII Gene 

m Step 1: 29 Fragments Assembled to Form 10 Fragments . 

25 The 29 pairs of oligonucleotides, which formed 

segments 1 to 29 when base-paired, are described below. 

Plasmids carrying segments 1, 5, 9, 12, 16, 20, 24 
and 27 were digested with EcoRl and BsmBI and the 170 bp 
fragments were isolated; plasmids bearing segments 2, 3, 6, 

30 7, 10, 13, 17, 18, 21, 25, and 28 were digested with Bbsl 
and BsmBI and the 170 bp fragments were isolated; and 
plasmids bearing segments 4, 8, 11, 14, 19, 22, 26 and 29 
were digested with EcoRI and Bbsl and the 2440 bp vector 
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fragment was isolated. Fragments bearing segments 1, 2, 3 
and 4 were then ligated to generate segment "A"; fragments 
bearing segments 5 f 6, 7 and 8 were ligated to generate 
segment "B"; fragments bearing segments 9, 10 and 11 were 
ligated to generate segment "C"; fragments bearing segments 
12 , 13, and 14 were ligated to generate segment "D"; 
fragments bearing segments 16, 17, 18 and 19 were ligated to 
generate segment "F"; fragments bearing segments 20, 21 and 
22 were ligated to generate segment "G"; fragments bearing 
segments 24, 25 and 26 were ligated to generate segment "I"; 
and fragments bearing segments 27, 28 and 29 were ligated to 
generate segment "J". 

Step 2: Assembly of the 10 resulting 

Fragments from Step 1 to Three Fragments . 

Plasmids carrying the segments "A", "D" and "G" were 
digested with EcoRI and BsmBI, plasmids carrying the 
segments B, 15, 23, and I were digested with Bbsl and BsmBI, 
and plasmids carrying the segments C, F, and J were digested 
with EcoRI and Bbsl. Fragments bearing segments A, B, and C 
were ligated to generate segment n K"; fragments bearing 
segments D, 15, and F were ligated to generate segment "O"; 
and fragments bearing segments G, 23, I, and J were ligated 
to generate segment "P". 

Step 3: Assembly of the Final Three Pieces . 

The plasmid bearing segment K was digested with 
EcoRI and BsmBI, the plasmid bearing segment 0 was digested 
with Bbsl and BsmBI, and the plasid bearing segment P was 
digested with EcoRI and Bbsl. The three resulting fragments 
were ligated to generate segments. 
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Step 4: Insertion of the Synthet ic Gene in a 
Mammalian Expression Vector , 

The plasmid bearing segment S was digested with Nhel 
and NotI and inserted between Nhel and EagI sites of plasmid 
CDSlNEgl to generate plasmid cd51sf8b-. 

Sequencing and Correction of the Syntheti c Factor VIII Gene 

After assembly of the synthetic gene it was 
discovered that there were two undesired residues encoded in 
the sequence. One was an Arg residue at 749, which is 
present in the GenBank sequence entry originating from 
Genentech but is not in the sequence reported by Genentech 
in the literature. The other was an Ala residue at 146, 
which should have been Pro. This mutation arose at an 
unidentified step subsequent to the sequencing of the 29 
constituent fragments. The Pro749Arg mutation was corrected 
by incorporating the desired change in a PCR primer (ctg ctt 
ctg acg cgt get ggg gtg gcg gga gtt; SEQ ID NO: 44) that 
included the Mlul site at position 2335 of the sequence 
below (sequence of Hindlll to NotI segment) and amplifying 
between that primer and a primer (ctg ctg aaa gtc tec age 
tgc; SEQ ID NO: 44) 5' to the SgrAI site at 2225. The SgrAI 
to Mlul fragment was then inserted into the expression 
vector at the cognate sites in the vector, and the resulting 
correct sequence change verified by sequencing. The 
Prol46Ala mutation was corrected by incorporating the 
desired sequence change in an oligonucleotide (ggc agg tgc 
tta agg aga acg gec eta tgg cca; SEQ ID NO: 46) bearing the 
Aflll site at residue 504, and amplifying the fragment 
resulting from PCR reaction between that oligo and the 
primer having sequence cgt tgt tct tea tac gcg tct ggg get 
cct egg ggc (SEQ ID NO: 109), cutting the resulting PCR 
fragment with Aflll and Avrll at (residue 989) , inserting 



the corrected fragment into the expression vector and 

confirming the construction by sequencing. 

Construction of a Matched Native Gene Expressing Human 
Factor VIII Lacking the Central B Domain 

A matched Factor VIII B domain deletion expression 
plasmid having the native codon sequence was constructed by 
introducing Nhel at the 5' end of the mature coding sequence 
using primer cgc caa ggg eta gec gec acc aga aga tac tac ctg 
ggt (SEQ ID NO: 47), amplifying between that primer and the 
primer att cgt agt tgg ggt tec tct gga cag (corresponding to 
residues 1067 to 1093 of the sequence shown below) , cutting 
with Nhel and Aflll (residue 345 in the sequence shown 
below) and inserting the resulting fragment into an 
appropriately cleaved plasmid bearing native Factor VIII. 
The B domain deletion was created by overlap PCR using ctg 
tat ttg atg aga acc g, (corresponding to residues 1813 to 
1831 below) and caa gac tgg tgg ggt ggc att aaa ttg ctt t 
(SEQ ID NO: 48) (2342 to 2372 on complement below) for the 5' 
end of the overlap, and aat gec acc cca cca gtc ttg aaa cgc 
ca (SEQ ID NO: 49) (2352 to 2380 on sequence below) and cat 
ctg gat att gca ggg ag (SEQ ID NO:50) (3145 to 3164). The 
products of the two individual PCR reactions were then mixed 
and reamplified by use of the outermost primers, the 
resulting fragment cleaved by Asp718 (Kpnl isoschizomer , 
1837 on sequence below) and PflMI (3100 on sequence below) , 
and inserted into the appropriately cleaved expression 
plasmid bearing native Factor VIII. 

The complete sequence (SEQ ID NO: 41) of the native 
human factor VIII gene deleted for the central B region is 
presented in Figure 12. The complete sequence (SEQ ID 
NO: 42) of the synthetic Factor VIII gene deleted for the 
central B region is presented in Figure 13 . 



Preparation and assay of expression p lasmids 

Two independent plasmid isolates of the native, and 
four independent isolates of the synthetic Factor VIII 
expression plasmid were separately propagated in bacteria 
and their DNA prepared by CsCl buoyant density 
centrifugation followed by phenol extraction. Analysis of 
the supernatants of COS cells transf ected with the plasmids 
showed that the synthetic gene gave rise to approximately 
four times as much Factor VIII as did the native gene. 

COS cells were then transfected with 5 /xg of each 
factor VIII construct per 6 cm dish using the DEAE-dextran 
method. At 72 hours post-transf ection, 4 ml of fresh medium 
containing 10% calf serum was added to each plated. A 
sample of media was taken from each plate 12 hr later. 
Samples were tested by ELISA using mouse anti-human factor 
VIII light chain monoclonal antibody and 

peroxidase-conjugated goat anti-human factor VIII polyclonal 
antibody. Purified human plasma factor VIII was used as a 
standard. Cells transfected with the synthetic Factor Vlll 
gene construct expressed 138 ±20.2 ng/ml (equivalent ng/ml 
non-deleted Factor VIII) of Factor VIII (n=4) while the 
cells transfected with the native Factor VIII gene expressed 
33.5 ± 0.7 ng/ml (equivalent ng/ml non-deleted Factor VIII) 

of Factor VIII (n=2) . 

The following template oligonucleotides were used 

for construction of the synthetic Factor VIII gene. 



cgc gaa ttc gga aga 
ccg ccg eta eta cct 
gtc ctg gga eta cat 
cga get ccc cgt gga 



rl bbs 1 for (gcta) 

ccc get age cgc cac 1 rl 

ggg cgc cgt gga get 

gca gag cga cct ggg 

(SEQ ID NO: 51) 
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T 





ggg 


gat 


cct 


cac 


gtc tea ggt ttt ctt gta 




cac 


cac 


get 


ggt 


gtt 


gaa ggg 


gaa get 


ctt 




ggg 


cac 


gcg 


ggg 


ggg 


gaa gcg 


ggc gtc 


cac 




ggg 


gag 


etc 


gcc 


ca 


(SEQ ID NO: 52) 




5 












rl bbs 


2 for (aacc) 




cgc 


gaa 


ttc 


gga 


aga 


ccc aac 


cct gtt 


cgt 




gga 


gtt 


cac 


cga 


cca 


cct gtt 


caa cat 


tgc 




caa 


gcc 


gcg 


ccc 


ccc 


ctg gat 


ggg cct 


get 




ggg 


ccc 


cac 


cat 


cca 


(SEQ ID 


NO:53) 




10 


ggg 


gat 


cct 


cac 


gtc tea gtg 


cag get 


gac 




ggg 


gtg 


get 


ggc 


cat 


gtt ctt 


cag ggt 


gat 




cac 


cac 


ggt 


gtc 


gta 


cac etc 


ggc ctg 


gat 


ft* T>- 
(V 


ggt 


ggg 


gcc 


cag 


ca 


(SEQ ID NO: 54) 
















rl bbs 


3 for (gcac) 


*r is 


cgc 


gaa 


ttc 


gga 


aga 


ccc gca 


cgc cgt 


ggg 




cgt 


gag 


eta 


ctg 


gaa 


ggc cag 


cga ggg 


cgc 




cga 


gta 


cga 


cga 


cca 


gac gtc 


cca gcg 


cga 


h 5 ; f\ 


gaa 


gga 


gga 


cga 


caa 


(SEQ ID 


NO: 55) 




■1 H" 


ggg 


gat 


cct 


cac 


gtc 


tea get 


ggc cat 


agg 


01 20 


gcc 


gtt 


etc 


ctt 


aag 


cac ctg 


cca cac 


gta 




ggt 


gtg 


get 


ccc 


ccc 


egg gaa 


cac ctt 


gtc 




gtc 


etc 


ctt 


etc 


gc 


(SEQ ID NO: 56) 
















rl bbs 


4 for (cage) 




cgc 


gaa 


ttc 


gga 


aga 


ccc cag 


cga ccc 


cct 


25 


gtg 


cct 


gac 


eta 


cag 


eta cct 


gag cca 


cgt 




gga 


cct 


ggt 


gaa 


gga 


tct gaa 


cag egg 


get 




gat 


egg 


cgc 


cct 


get 


(SEQ ID 


NO:57) 





1 bam 



2 rl 



2 bam 



3 rl 



3 bam 



4 rl 
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ggg gat cct cac gtc tea gaa cag cag gat 4 bam 

gaa ctt gtg cag ggt ctg ggt ttt etc ctt 
ggc cag get gec etc gcg aca cac cag cag 
ggc gec gat cag cc (SEQ ID NO: 58) 

5 rl bbs 5 for (gttc) 



cgc 


gaa 


ttc 


gga 


aga 


ccc gtt 


cgc cgt 


gtt 


cga 


cga 


ggg 


gaa 


gag 


ctg gca 


cag cga 


gac 


taa 


gaa 


cag 


cct 


gat 


gca gga 


ccg cga 


cgc 


cgc 


cag 


cgc 


ccg 


cgc 


(SEQ ID 


N0:59) 





5 rl 



10 ggg gat cct cac gtc tea gtg gca gec gat 
cag gee ggg cag get gcg gtt cac gta gec 
gtt aac ggt gtg cat ctt ggg cca ggc gcg 
ggc get ggc ggc gt (SEQ ID NO: 60) 

rl bbs 6 for (ccac) 



15 cgc 


gaa 


ttc 


gga 


aga 


ccc cca 


ccg 


caa 


gag 


cgt 


gta 


ctg 


gca 


cgt 


cat egg 


cat 


ggg 


cac 


cac 


ccc 


tga 


ggt 


gca 


cag cat 


ctt 


cct 


gga 


ggg 


cca 


cac 


ctt 


cct 


(SEQ ID 


NO: 


61) 





6 rl 



ggg 


gat 


cct 


cac 


gtc tea cag ggt ctg 


ggc 


20 agt 


cag 


gaa 


ggt 


gat ggg get gat etc 


cag 


get 


ggc 


ctg 


gcg 


gtg gtt gcg cac cag 


gaa 


ggt 


gtg 


gee 


etc 


ca (SEQ ID NO: 62) 





6 bam 



rl bbs 7 for (cctg) 



cgc 


gaa 


ttc 


gga 


aga 


ccc cct 


get 


gat 


gga 


25 cct 


agg 


cca 


gtt 


cct 


get gtt 


ctg 


cca 


cat 


cag 


cag 


cca 


cca 


gca 


cga egg 


cat 


gga 


ggc 


tta 


cgt 


gaa 


ggt 


gga 


(SEQ ID 


NO: 


63) 





7 rl 
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ggg gat cct cac gtc tea gtc gtc gtc gta 7 bam 

gtc etc ggc etc etc gtt gtt ctt cat gcg 
cag ctg ggg etc etc ggg gca get gtc cac 
ctt cac gta age ct (SEQ ID NO: 64) 

5 rl bbs 8 for (cgac) 

cgc gaa ttc gga aga ccc cga cct gac cga 8 rl 

cag cga gat gga tgt cgt acg ctt cga cga 
cga caa cag ccc cag ctt cat cca gat ccg 
cag cgt ggc caa gaa (SEQ ID NO: 65) 

10 ggg gat cct cac gtc tea tac tag egg ggc 8 bam 

gta gtc cca gtc etc etc etc ggc ggc gat 
gta gtg cac cca ggt ctt agg gtg ctt ctt 
ggc cac get gcg ga (SEQ ID NO: 66) 

rl bbs 9 for (agta) 
15 cgc gaa ttc gga aga ccc agt act ggc ccc 9 rl 

cga cga ccg cag eta caa gag cca gta cct 
gaa caa egg ccc cca gcg cat egg ccg caa 
gta caa gaa ggt gcg (SEQ ID NO: 67) 

ggg gat cct cac gtc tea gag gat gee gga 9 bam 

20 etc gtg ctg gat ggc etc gcg ggt ctt gaa 
agt etc gtc ggt gta ggc cat gaa gcg cac 
ctt ctt gta ctt gc (SEQ ID NO: 68) 

rl bbs 10 for (cctc) 
cgc gaa ttc gga aga ccc cct egg ccc cct 10 rl 

25 get gta egg cga ggt ggg cga cac cct get 
gat cat ctt caa gaa cca ggc cag cag gee 
eta caa cat eta ccc (SEQ ID NO: 69) 
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ggg gat cct cac gtc tea ctt cag gtg ctt 10 bam 

cac gec ctt ggg cag gcg gcg get gta cag 
ggg gcg cac gtc ggt gat gec gtg ggg gta 
gat gtt gta ggg cc (SEQ ID NO: 70) 

rl bbs 11 for (gaag) 
cgc gaa ttc gga aga ccc gaa gga ctt ccc 11 rl 

cat cct gec egg cga gat ctt caa gta caa 
gtg gac cgt gac cgt gga gga egg ccc cac 
caa gag cga ccc ccg (SEQ ID NO: 71) 

ggg gat cct cac gtc tea gee gat cag tec 11 bam 

gga ggc cag gtc gcg etc cat gtt cac gaa 
get get gta gta gcg ggt cag gca gcg ggg 
gtc get ctt ggt gg (SEQ ID NO: 72) 

rl bbs 12 for (egge) 
cgc gaa ttc gga aga ccc egg ccc cct get 12 rl 

gat ctg eta caa gga gag cgt gga cca gcg 
egg caa cca gat cat gag cga caa gcg caa 
cgt gat cct gtt cag (SEQ ID NO: 73) 



ggg 


gat 


cct 


cac 


cag 


gaa 


gcg 


ctg 


cca 


get 


gcg 


gtt 


cag 


gat 


cac 


gtt 



gtc tea age ggg gtt 
gat gtt etc ggt cag 
etc gtc gaa cac get 
gc (SEQ ID NO: 74) 



ggg 12 bam 

ata 

gaa 



rl bbs 13 for (cget) 
cgc gaa ttc gga aga ccc cgc tgg cgt gca 13 rl 

get gga aga tec cga gtt cca ggc cag caa 
cat cat gca cag cat caa egg eta cgt gtt 
cga cag cct gca get (SEQ ID NO: 75) 



> 



i 





ggg 


gat 


cct 


cac 


gtc 


tea cag gaa gtc 


ggt 


13 




ctg 


ggc 


gee 


gat 


get 


cag gat gta cca 


gta 






ggc 


cac 


etc 


atg 


cag 


gca cac get cag 


ctg 






cag 


get 


gtc 


gaa 


ca 


(SEQ ID NO: 76) 






5 












rl bbs 14 for (cctg) 






cgc 


gaa 


ttc 


gga 


aga 


ccc cct gag cgt 


gtt 


14 




ctt 


etc 


egg 


gta 


tac 


ctt caa gca caa 


gat 






ggt 


gta 


cga 


gga 


cac 


cct gac cct gtt 


ccc 






ctt 


etc 


egg 


cga 


gac 


(SEQ ID NO: 77) 






10 


ggg 


gat 


cct 


cac 


gtc 


tea gtt gcg gaa 


gtc 


14 




get 


gtt 


gtg 


gca 


gee 


cag aat cca cag 


gee 






ggg 


gtt 


etc 


cat 


aga 


cat gaa cac agt 


etc 






gec 


gga 


gaa 


ggg 


ga 


(SEQ ID NO: 78) 


















rl bbs 15 for (caac) 




15 


cgc 


gaa 


ttc 


gga 


aga 


ccc caa ccg egg 


cat 


15 




gac 


tgc 


cct 


get 


gaa 


agt etc cag ctg 


cga 






caa 


gaa 


cac 


egg 


cga 


eta eta cga gga 


cag 






eta 


cga 


gga 


cat 


etc 


(SEQ ID NO:79) 








ggg 


gat 


cct 


cac 


gtc 


tea gcg gtg gcg 


gga 


15 


20 


gtt 


ttg 


gga 


gaa 


gga 


gcg ggg etc gat 


ggc 






gtt 


gtt 


ctt 


gga 


cag 


cag gta ggc gga 


gat 






gtc 


etc 


gta 


get 


gt 


(SEQ ID NO: 80) 


















rl bbs 16 for (cege) 






cgc 


gaa 


ttc 


gga 


aga 


ccc ccg cag cac 


gcg 


16 


25 


tea 


gaa 


gca 


gtt 


caa 


cgc cac ccc ccc 


cgt 






get 


gaa 


gcg 


cca 


cca 


gcg cga gat cac 


ccg 






cac 


cac 


cct 


gca 


aag 


(SEQ ID NO: 81) 
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ft 

J 





ggg 


gat 


cct 


cac 


gtc tea gat gtc gaa gtc 




etc 


ctt 


ctt 


cat 


etc 


cac get gat ggt gtc 




gtc 


gta 


gtc 


gat 


etc etc ctg gtc get ttg 




cag 


ggt 


ggt 


gcg 


gg 


(SEQ ID NO: 82) 


5 












rl bbs 17 for (catc) 




cgc 


gaa 


ttc 


gga 


aga 


ccc cat eta cga cga 




gga 


cga 


gaa 


cca 


gag 


ccc ccg etc ctt cca 




aaa 


gaa 


aac 


ccg 


cca 


eta ctt cat cgc cgc 




cgt 


gga 


gcg 


cct 


gtg 


(SEQ ID NO: 83) 


10 


ggg 


gat 


cct 


cac 


gtc 


tea ctg ggg cac get 




gec 


get 


ctg 


ggc 


gcg gtt gcg cag gac gtg 




ggg 


get 


get 


get 


cat 


gee gta gtc cca cag 




gcg 


etc 


cac 


ggc 


gg 


(SEQ ID NO: 84) 














rl bbs 18 for (ccag) 




cgc 


gaa 


ttc 


gga 


aga 


ccc cca gtt caa gaa 




ggt 


ggt 


gtt 


cca 


gga 


gtt cac cga egg cag 




ctt 


cac 


cca 


gee 


cct 


gta ccg egg cga get 


Tt ^ 
** 1 " 


gaa 


cga 


gca 


cct 


ggg 


(SEQ ID NO: 85) 


* ■! ► + 


ggg 


gat 


cct 


cac 


gtc 


tea ggc ttg gtt gcg 


5 20 


gaa 


ggt 


cac 


cat 


gat 


gtt gtc etc cac etc 




ggc 


gcg 


gat 


gta 


ggg 


gee gag cag gec cag 




gtg 


etc 


gtt 


cag 


ct 


(SEQ ID NO: 86) 














rl bbs 19 for (agee) 




cgc 


gaa 


ttc 


gga 


aga 


ccc age etc ccg gee 


25 


eta 


etc 


ctt 


eta 


etc 


etc cct gat cag eta 




cga 


gga 


gga 


cca 


gcg 


cca ggg cgc cga gee 




ccg 


caa 


gaa 


ctt 


cgt 


(SEQ ID NO: 87) 



16 bam 



17 rl 



17 bam 



18 rl 



18 bam 



19 rl 
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ggg gat cct cac gtc tea etc gtc ctt ggt 19 bam 

ggg ggc cat gtg gtg ctg cac ctt cca gaa 
gta ggt ctt agt etc gtt ggg ctt cac gaa 
gtt ctt gcg ggg ct (SEQ ID NO: 88) 

5 rl bbs 20 for (cgag) 

cgc gaa ttc gga aga ccc cga gtt cga ctg 20 rl 

caa ggc ctg ggc eta ctt cag cga cgt gga 
cct gga gaa gga cgt gca cag egg cct gat 
egg ccc cct get ggt (SEQ ID NO: 89) 

10 ggg gat cct cac gtc tea gaa cag ggc aaa 20 bam 

ttc ctg cac agt cac ctg cct ccc gtg ggg 
ggg gtt cag ggt gtt ggt gtg gca cac cag 
cag ggg gee gat ca (SEQ ID NO: 90) 

rl bbs 21 for (gttc) 
15 cgc gaa ttc gga aga ccc gtt ctt cac cat 21 rl 

ctt cga cga gac taa gag ctg gta ctt cac 
cga gaa cat gga gcg caa ctg ccg cgc ccc 
ctg caa cat cca gat (SEQ ID NO: 91) 

ggg gat cct cac gtc tea cag ggt gtc cat 21 bam 

20 gat gta gec gtt gat ggc gtg gaa gcg gta 
gtt etc ctt gaa ggt ggg ate ttc cat ctg 
gat gtt gca ggg gg (SEQ ID NO: 92) 

rl bbs 22 for (cctg) 
cgc gaa ttc gga aga ccc cct gee egg cct 22 rl 

25 ggt gat ggc cca gga cca gcg cat ccg ctg 
gta cct get gtc tat ggg cag caa cga gaa 
cat cca cag cat cca (SEQ ID NO: 93) 
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ggg gat cct cac gtc tea gta cag gtt gta 22 bam 

cag ggc cat ctt gta etc etc ctt ctt gcg 
cac ggt gaa aac gtg gee get gaa gtg gat 
get gtg gat gtt ct (SEQ ID NO: 94) 

5 rl bbs 23 for (gtac) 

cgc gaa ttc gga aga ccc gta ccc egg cgt 23 rl 

gtt cga gac tgt gga gat get gee cag caa 
ggc egg gat ctg gcg cgt gga gtg cct gat 
egg cga gca cct gca (SEQ ID NO: 95) 

10 ggg gat cct cac gtc tea get ggc cat gec 23 bam 

cag ggg ggt ctg gca ctt gtt get gta cac 
cag gaa cag ggt get cat gee ggc gtg cag 
gtg etc gee gat ca (SEQ ID NO: 96) 

rl bbs 24 for (cage) 
15 cgc gaa ttc gga aga ccc cag egg cca cat 24 rl 

ccg cga ctt cca gat cac cgc cag egg cca 
gta egg cca gtg ggc tec caa get ggc ccg 
cct gca eta cag egg (SEQ ID NO: 97) 

ggg gat cct cac gtc tea cat ggg ggc cag 24 bam 

20 cag gtc cac ctt gat cca gga gaa ggg etc 
ctt ggt cga cca ggc gtt gat get gee get 
gta gtg cag gcg gg (SEQ ID NO: 98) 

rl bbs 25 for (catg) 
cgc gaa ttc gga aga ccc cat gat cat cca 25 rl 

25 egg cat caa gac cca ggg cgc ccg cca gaa 
gtt cag cag cct gta cat cag cca gtt cat 
cat cat gta etc tct (SEQ ID NO: 99) 
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ggg gat cct cac gtc tea gtt gec gaa gaa 
cac cat cag ggt gec ggt get gtt gec gcg 
gta ggt ctg cca ctt ctt gec gtc tag aga 
gta cat gat gat ga (SEQ ID NO: 100) 

rl bbs 26 for (caac) 
cgc gaa ttc gga aga ccc caa cgt gga cag 
cag egg cat caa gca caa cat ctt caa ccc 
ccc cat cat cgc ccg eta cat ccg cct gca 
ccc cac cca eta cag (SEQ ID NO: 101) 

ggg gat cct cac gtc tea gec cag ggg cat 
get gca get gtt cag gtc gca gee cat cag 
etc cat gcg cag ggt get gcg gat get gta 
gtg ggt ggg gtg ca (SEQ ID NO: 102) 

rl bbs 27 for (gggc) 
cgc gaa ttc gga aga ccc ggg cat gga gag 
caa ggc cat cag cga cgc cca gat cac cgc 
etc cag eta ctt cac caa cat gtt cgc cac 
ctg gag ccc cag caa (SEQ ID NO: 103) 

ggg gat cct cac gtc tea cca etc ctt ggg 

gtt gtt cac ctg ggg gcg cca ggc gtt get 

gcg gee ctg cag gtg cag gcg ggc ctt get 

ggg get cca ggt gg (SEQ ID NO: 104) 

rl bbs 28 for (gtgg) 
cgc gaa ttc gga aga ccc gtg get gca ggt 
gga ctt cca gaa aac cat gaa ggt gac tgg 
cgt gac cac cca ggg cgt caa gag cct get 
gac cag cat gta cgt (SEQ ID NO: 105) 
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ggg gat cct cac gtc tea ctt gec gtt ttg 28 bam 

gaa gaa cag ggt cca ctg gtg gec gtc ctg 

get get get gat cag gaa etc ctt cac gta 

cat get ggt cag ca (SEQ ID NO: 106) 

rl bbs 29 for (caag) 
cgc gaa ttc gga aga ccc caa ggt gaa ggt 29 rl 

gtt cca ggg caa cca gga cag ctt cac acc 
ggt cgt gaa cag cct gga ccc ccc cct get 
gac ccg eta cct gcg (SEQ ID NO: 107) 

ggg gat cct cac gtc tea gcg gee get tea 29 bam 

gta cag gtc ctg ggc etc gca gec cag cac 
etc cat gcg cag ggc gat ctg gtg cac cca 
get ctg ggg gtg gat gcg cag gta gcg ggt 
cag ca (SEQ ID NO: 108) 

The codon usage for the native and synthetic genes 
described above are presented in Tables 3 and 4, 
respectively • 

TABLE 3: Codon Frequency of the Synthetic 

Factor VIII B Domain Deleted Gene 

AA Codon Number /1000 Fraction 



Gly 


GGG 


7.00 


4.82 


0.09 


Gly 


GGA 


1.00 


0.69 


0.01 


Gly 


GGT 


0.00 


0.00 


0.00 


Gly 


GGC 


74.00 


50.93 


0.90 


Glu 


GAG 


81.00 


55.75 


0.96 


Glu 


GAA 


3.00 


2.06 


0.04 


Asp 


GAT 


4.00 


2.75 


0.05 


Asp 


GAC 


78.00 


53.68 


0.95 


Val 


GTG 


77.00 


52.99 


0.88 


Val 


GTA 


2.00 


1.38 


0.02 


Val 


GTT 


2.00 


1.38 


0.02 





Val 


GTC 


7.00 


4.82 


0.08 




Ala 


GCG 


0.00 


0.00 


0.00 




Ala 


GCA 


0.00 


0.00 


0.00 


cr 
O 


Ala 


GCT 


3 . 00 


2 . 06 


0.04 




Ala 


GCC 


67.00 


46.11 


0.96 




Arg 


AGG 


2.00 


1.38 


0.03 




Arg 


AGA 


0.00 


0.00 


0.00 


10 


Ser 


AGT 


0. 00 


0. 00 


0. 00 




Ser 


AGC 


97.00 


66.76 


0.81 




Lys 


AAG 


75.00 


51.62 


0.94 




Lys 


Au^A. 


5.00 


3.44 


0.06 


15 


Asn 


AAT 


0. 00 


0. 00 


0.00 




Asn 


AAC 


63.00 


43.36 


1. 00 




Met 


ATG 


43.00 


29.59 


1. 00 




He 


ATA 


0.00 


0.00 


0.00 


20 


He 


ATT 


2.00 


1.38 


0.03 




He 


ATC 


72. 00 


49.55 


0. 97 




Thr 


ACG 


2 . 00 


1.38 


0. 02 




Thr 


ACA 


1.00 


0.69 


0.01 


25 


Thr 


ACT 


10. 00 


6.88 


0.12 




Thr 


ACC 


70.00 


48.18 


0.84 




Trp 


TGG 


28.00 


19.27 


1.00 




End 


TGA 


1.00 


0.69 


1.00 


30 


Cys 


TGT 


1. 00 


0. 69 


0.05 




Cys 


TGC 


18.00 


12.39 


0.95 




End 


TAG 


0.00 


0.00 


0.00 




End 


TAA 


0. 00 


0.00 


0.00 


35 




Tyr 


TAT 


2 . 00 


1. 38 


0. 03 




Tyr 


TAC 


66.00 


45.42 


0.97 




Leu 


TTG 


0.00 


0. 00 


0. 00 




Leu 


TTA 


0.00 


0.00 


0.00 


40 


Phe 


P«h MM| 

TTT 


1.00 


0.69 


0.01 




Phe 


TTC 


76.00 


52.31 


0. 99 




Ser 


TCG 


1.00 


0.69 


0. 01 




Ser 


TCA 


0.00 


0.00 


0.00 


45 


Ser 


TCT 


3.00 


2.06 


0. 03 




Ser 


TCC 


19.00 


13.08 


0.16 




Arg 


CGG 


1.00 


0.69 


0.01 




Arg 


CGA 


0.00 


0.00 


0.00 



10 



15 



Arg 
Arg 

Gin 
Gin 
His 
His 

Leu 
Leu 
Leu 
Leu 

Pro 
Pro 
Pro 
Pro 



CGT 
CGC 

CAG 
CAA 
CAT 
CAC 

CTG 
CTA 
CTT 
CTC 

CCG 
CCA 
CCT 
CCC 



1.00 
69.00 

62.00 
5.00 
1.00 

50.00 

118. 00 
3. 00 
1.00 
3.00 

4.00 
0.00 
3. 00 
68.00 



0.69 
47.49 

42.67 
3.44 
0.69 

34.41 

81.21 
2.06 
0.69 
2.06 

2.75 
0.00 
2.06 
46.80 



0.01 
0.95 

0.93 
0.07 
0.02 
0.98 



0 
0 
0 
0 



94 
02 
01 
02 



0.05 
0.00 
0.04 
0.91 
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TABLE 4: Codon Frequency Table of the Native 

Factor VIII B Domain Deleted Gene 

AA Codon Number /1000 Fraction 



Gly 


>M 

GGG 


12 . 00 


8.26 


0.15 


Gly 


GGA 


34.00 


23 .40 


0.41 


Gly 


GGT 


16. 00 


11.01 


0.20 


Gly 

■J. 


GGC 


20.00 


13 .76 


0.24 


Glu 


GAG 


33.00 


22 .71 


0.39 


Glu 


GAA 


51.00 


35. 10 


0. 61 


Asp 


GAT 


55. 00 


37.85 


0. 67 


Asp 


GAC 


27.00 


18 . 58 


0.33 


Val 


GTG 


29. 00 


19.96 


0.33 


Val 


GTA 


19.00 


13 . 08 


0. 22 


Val 


GTT 


17.00 


11.70 


0.19 


Val 


GTC 


23.00 


15.83 


0.26 


Ala 


GCG 


2 . 00 


1.38 


0.03 


Ala 


GCA 


18.00 


12.39 


0.25 


Ala 


GCT 


31.00 


21.34 


0. 44 


Ala 


GCC 


20.00 


13.76 


0. 28 


Arg 


AGG 


18. 00 


12.39 


0.25 


Arg 


AGA 


22.00 


15.14 


0.30 


Ser 


AGT 


22.00 


15.14 


0. 18 


Ser 


AGC 


24.00 


16.52 


0.20 


Lys 


AAG 


32.00 


22.02 


0.40 


Lys 


AAA 


48 . 00 


33 . 04 


0. 60 


Asn 


AAT 


38 00 




n fin 


Asn 


AAC 


25.00 


17.21 


0.40 


Met 


ATG 


43.00 


29.59 


1.00 


lie 


ATA 


13.00 


8.95 


0.18 


lie 


ATT 


36.00 


24.78 


0.49 


lie 


ATC 


25.00 


17.21 


0.34 


Thr 


ACG 


1.00 


0.69 


0.01 


Thr 


ACA 


23.00 


15.83 


0.28 


Thr 


ACT 


36.00 


24.78 


0.43 


Thr 


ACC 


23.00 


15.83 


0.28 


Trp 


TGG 


28.00 


19.27 


1.00 


End 


TGA 


1.00 


0.69 


1.00 
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Cys 


TGT 


7.00 


4.82 


0.37 


Cys 


TGC 


12 . 00 


8 . 26 


0. 63 


End 


TAG 


0. 00 


0. 00 


0.00 


End 


TAA 


0.00 


0.00 


0.00 


Tyr 


TAT 


41. 00 


28.22 


0.60 


Tyr 


TAC 


27.00 


18.58 


0.40 


Leu 


TTG 


20.00 


13.76 


0. 16 


Leu 


TTA 


10.00 


6.88 


0.08 


Phe 


TTT 


45. 00 


30. 97 


0.58 


Phe 


TTC 


32.00 


22.02 


0.42 


Ser 


TCG 


2.00 


1.38 


0.02 


Ser 


TCA 


27.00 


18.58 


0.22 


Ser 


TCT 


27. 00 


18.58 


0.22 


Ser 


TCC 


18.00 


12 .39 


0. 15 


Arq 


CGG 


6. 00 


4 . 13 


0 . 08 


Arg 


CGA 


10.00 


6.88 


0.14 


Arg 


CGT 


7.00 


4.82 


0.10 


Arg 


CGC 


10 . 00 


6 88 


0 14. 


Gin 


CAG 


42 . 00 


28 . 91 




Gin 


CAA 


25.00 


17.21 


0.37 


His 


CAT 


28.00 


19.27 


0.55 


His 


CAC 


23.00 


15.83 


0. 45 


Leu 


CTG 


36. 00 


24.78 


0. 29 


Leu 


CTA 


15.00 


10.32 


0.12 


Leu 




24 • 00 


16 . 52 


0. 19 


Leu 


CTC 


20.00 


13.76 


0.16 


Pro 


CCG 


1.00 


0.69 


0.01 


Pro 


CCA 


32.00 


22.02 


0.43 


Pro 


CCT 


26.00 


17.89 


0.35 


Pro 


CCC 


15.00 


10.32 


0.20 



Use 

The synthetic genes of the invention are useful for 
expressing the a protein normally expressed in mammalian 
cells in cell culture (e.g. for commercial production of 
human proteins such as hGH, TPA, Factor VIII, and Factor 
IX) . The synthetic genes of the invention are also useful 
for gene therapy. For example, a synthetic gene encoding a 
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selected protein can be introduced in to a cell which can 
express the protein to create a cell which can be 
administered to a patient in need of the protein* Such 
cell-based gene therapy techniques are well known to those 
skilled in the art, see, e.g., Anderson, et al., U.S. Patent 
No. 5,399,349; Mulligan and Wilson, U.S. Patent 
No. 5,460,959. 



What is claimed is: 
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1. A synthetic gene encoding a protein normally 
expressed in an eukaryotic cell wherein at least one non- 
preferred or less preferred codon in a natural gene encoding 
said protein has been replaced by a preferred codon encoding 
the same amino acid, said synthetic gene being capable of 
expressing said protein at a level which is at least 110% of 
that expressed by said natural gene in an in vitro mammalian 
cell culture system under identical conditions, 

2. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said protein at a 
level which is at least 150% of that expressed by said 
natural gene in an in vitro cell culture system under 
identical conditions. 

3 . The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said protein at a 
level which is at least 2 00% of that expressed by said 
natural gene in an in vitro cell culture system under 
identical conditions* 

4 . The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said protein at a 
level which is at least 500% of that expressed by said 
natural gene in an in vitro cell culture system under 
identical conditions. 

5. The synthetic gene of claim 1 wherein said 
synthetic gene comprises fewer than 5 occurrences of the 
sequence CG. 



6. 

10% of the 
codons . 



The synthetic gene of claim 1 wherein at least 
codons in said natural gene are non-preferred 



7. The synthetic gene of claim 1 wherein at least 
50% of the codons in said natural gene are non-preferred 
codons . 



8. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by preferred 
codons . 



9. The synthetic gene of claim 1 wherein at least 
90% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by preferred 
codons • 



10. The synthetic gene of claim 1 wherein said 
protein is normally expressed by a mammalian cell. 

11. The synthetic gene of claim 1 wherein said 
protein is a retroviral protein. 

12. The synthetic gene of claim 1 wherein said 
protein is a lentiviral protein. 

13 . The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

14. The synthetic gene of claim 13 wherein said 
protein is selected from the group consisting of gag, pol, 
and env. 
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15. The synthetic gene of claim 13 wherein said 
protein is gpl20. 

16. The synthetic gene of claim 13 wherein said 
protein is gpl60. 

17. The synthetic gene of claim 1 wherein said 
protein is a human protein. 

18. The synthetic gene of claim 1 wherein said 
human protein is Factor VIII. 

19. The synthetic gene of claim 1 wherein 2 0% of 
the codons are preferred codons. 

20. The synthetic gene of claim 18 wherein said 
gene has the coding sequence present in SEQ ID NO: 42. 

21. The synthetic gene of claim 1 wherein said 
protein is green fluorescent protein. 

22. The synthetic gene of claim 20 wherein said 
synthetic gene is capable of expressing said green 
fluorescent protein at a level which is at least 200% of 
that expressed by said natural gene in an in vitro mammalian 
cell culture system under identical conditions. 

23. The synthetic gene of claim 20 wherein said 
synthetic gene is capable of expressing said green 
fluorescent protein at a level which is at least 1000% of 
that expressed by said n atural gene in an in vitro 
mammalian cell culture system under identical conditions. 



24 . The synthetic gene of claim 21 having the 
sequence depicted in Figure 11 (SEQ ID NO: 40). 

25. An expression vector comprising the synthetic 
gene of claim 1. 

26. The expression vector of claim 21, said 
expression vector being a mammalian expression vector. 

27. A mammalian cell harboring with the synthetic 
gene of claim 1. 

28. A method for preparing a synthetic gene 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less- 
preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 



HIGH LEVEL EXPRESSION OF PROTEINS 
Abstract of the Disclosure 
The invention features a synthetic gene encoding a 
protein normally expressed in a mammalian cell wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the protein has been replaced by a 
preferred codon encoding the same amino acid. 
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1 CTCGAGATCC ATTGTGCTCT AAAGGAGATA CCCGGCCAGA CACCCTCACC 

51 TGCGGTGCCC AGCTGCCCAG GCTGAGGCAA GAGAAGGCCA GAAACCATGC 

101 CCATGGGGTC 7CTGCAACCG CTGGCCACCT TGTACCTGCT GGGGATGCTG 

151 GTCGCTTCCG TCCTAGCCAC CGAGAAGCTG TGGGTGACCG TGTACTACGG 

201 CGTGCCCGTG TC-GAAGGAGG CCACCACCAC CCTGTTCTGC GCCAGCGACG 

251 CCAAGGCGTA CGACACCGAG GTGCACAACG TGTGGGCCAC CCAGGCGTGC 

3 01 GTGCCCACCG ACCCCAACCC CCAGGAGGTG GAGCTCGTGA ACGTGACCGA 

3 51 GAACTTCAAC ATGTGGAAGA ACAACATGGT GGAGCAGATG CATGAGGACA 

401 TCATCAGCCT GTGGGACCAG AGCCTGAAGC CCTGCGTGAA GCTGACCCCC 

451 CTGTGCGTGA C CCTGAACTG CACCGACCTG AGGAACACCA CCAACACCAA 

501 CAACAGCACC QCCAACAACA ACAGCAACAG CGAGGGCACC ATCAAGGGCG 

551 GCGAGATGAA CMCTGCAGC TTCAACATCA CCACCAGCAT CCGCGACAAG 

601 ATGCAGAAGG- AGTACGCCCT GCTGTACAAG CTGGATATCG TGAGCATCGA 

651 CAACGACAGC ACCAGCTACC GCCTGATCTC CTGCAACACC AGCGTGATCA 

701 CCCAGGCCTG CJCCCAAGATC AGCTTCGAGC CCATCCCCAT CCACTACTGC 

751 GCCCCCGCCG qCTTCGCCAT CCTGAAGTGC AACGACAAGA AGTTCAGCGG 

301 CAAGGGCAGC TGCAAGAACG TGAGCACCGT GCAGTGCACC CACGGCATCC 

851 GGCCGGTGGT ^AGCACCCAG CTCCTGCTGA ACGGCAGCCT GGCCGAGGAG 

901 GAGGTGGTGA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT 

951 CGTGCACCTG AATGAGAGCG TGCAGATCAA CTGCACGCGT CCCAACTACA 

1001 ACAAGCGCAA ^CGCATCCAC ATCGGCCCCG GGCGCGCCTT CTACACCACC 

10 51 AAGAACATCA TCGGCACCAT CCGCCAGGCC CACTGCAACA TCTCTAGAGC 

1101 CAAGTGGAAC GACACCCTGC GCCAGATCGT GAGCAAGCTC AAGGAGCAGT 

1151 TCAAGAACAA GACCATCGTG TTCAACCAGA GCAGCGGCGG CGACCCCGAG 

1201 ATCGTGATGC ACAGCTTCAA CTGCGGCGGC GAATTCTTCT ACTGCAACAC 

12 51 CAGCCCCCTG TTCAACAGCA CCTGGAACGG CAACAACACC TGGAACAACA 

13 01 CCACCGGCAG CAACAACAAT ATTACCCTCC AGTGCAAGAT CAAGCAGATC 

13 51 ATCAACATGT CGCAGGAGGT GGGCAAGGCC ATGTACGCCC CCCCCATCGA 

14 01 GGGCCAGATC CGGTGCAGCA GCAACATCAC CGGTCTGCTG CTGACCCGCG F^^l l 

1 , e * . ( 5H6BT j ^ n 

1451 ACGGCGGCAA CGACACCGAG ACCAACGACA CCGAAATCTT CCGCCCCGQC ^ 



1501 GGCGGCGACA TGCGCGACAA C7CGAGATCT GAGCTG7ACA AGTACAAGGT 
1551 GGTGACGATC C^GCCCCTGG GCGTGGCCCC CACCAAGGCC AAGCGCCGCG 
1601 TGGTGCAGCG CGAGAAGCGC TAAAGCGGCC GC (SEQ ID NO: 34) 
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1 ACCGAGAAGC TCTGGGTCAC 

51 GGCCACCACC ACCCTGTTCT 

10 i AGGTGCACAA CGTGTGGGCC 

.51 CCCCAGGAGG TGGAGCTCGT 

201 GAACAACATG CTGGAGCAGA 

251 AGAGCCTGAA C,CCCTGCGTG 

301 TGGACCGACC TGAGGAACAC 

i 

- 2 1 CAACAGCAAC AGCGAGGGCA 

4 01 GCTTCAACAT CACCACCAGC 

4 51 CTGCTG7ACA AGCTGGATAT 

SOI CCGCCTGATC TCCTGCAACA 

551 TCAGCTTCCA CCCCA7CCCC 

601 ATCCTGAAC? GCAACGACAA 

551 CGTGACCACC C/TGCAGTGCA 

701 ACCTGCTGCT qAACGGCACC 

7bl GAGAAC7TCA CCGACAACGC 

i 

i 

SOI CG'l'GCAGATC AACTGCACGC 

851 ACATCGGCCC CGGGCGCGCC 

901 ATCCGCCACG CCCACTGCAA 

3 51 GCGCCAGA7C C^TGAGCAAGC 

1001 7GTTCAACCA GAGCAGCGGC 

10 51 AACTGCGGCG (^CGAATTCTT 

11Q1 CACCTGGAAC QGCAACAACA 

1131 ATATTACCCT CGAGTGGAAG 

1201 GTGGGCAAGG CCATGTACGC 

12 51 CAGCAACA7C ACCCGTCTGC 

1 3 C 1 ACACCAACGA CACCGAAATC 
13 51 AACTGGAGAT CTGAGCTGTA 

I 

1101 GGGCGTCGCC CCCACCAAGG 



CGTGTACTAC GGCGTGCCCG TGTGGAAGGA 
GGGCCAGCGA CGCCAAGGCG TACGACACCG 
ACCCAGGCGT GCGTGCCCAC CGACCCCAAC 
GAACGTGACC GAGAACTTCA ACATGTGGAA 
7GCATGAGGA CATCATCAGC CTGTGGGACC 
AAGCTGACCC CCCTGTGCGT GACGCTCAAC. 
CACCAACACC AACAACAGCA CCGGCAACAA 
CCATCAAGGG CGGCG AG ATG AAGAAGTGCA 
ATCCGCGACA AGATCCAGAA GGAGTACGCC 
CGTGAGCATG GAGAACGACA GCACCAGCTA 
C CAGGGTG AT CACCCAGGCC TGCCCCAAGA 
ATCCACTACT GCGCGCCCGC CGGCTTCGCC 
GAAGTTCAGC GGCAAGGGCA GGTGCAAGAA 
CCCACGGCAT GCGGCCGGTG GTGAGCACCC 
CTGGCCGAGG AGGAGGTGGT GATCCGCAGC 
CAAGACCATC ATCGTGCACC TGAATGAGAG 
GTCCCAACTA CAACAAGCGC AAGCGCATCC 
TTCTACACCA CCAAGAACAT CATCGGCACC 
CATCTCTAGA GCCAAGTGGA ACGACACCC7 
TGAAGGAGCA G7TCAAGAAC AAGACCATCC 
GGCGACCCCG AGATCGTGAT GCACAGCTTC 
CTACTGCAAC ACCAGCCCCC TGTTCAACAG 
CCTGGAACAA CACCACCGGC AGCAACAACA 
ATCAAGCAGA 'L'CATCAACAT GTGGCAGGAG 
CCCCCCCATC GAGGGCCAGA TCCGGTGCAG 
TGCTGACCCG CGACGGCGGC AAGGACACCG 
^ZCCGCCCCG GCGGCGGCGA CATGCGCGAC 
CAAGTACAAG GTGGTGACGA TCGAGCCCCT 
CCAAGCGCCG CGTGGTGCAG CGCGAGAAGC 
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1451 GGGCCGCCAT CGGCGCCCTG TTCCTGGGCT TCCTGGGGGC GGCGGGCAGC 

1501 ACCATGGGGG CCGCCAGCGT GACCCTGACC GTGCAGGCCC GCCTGCTCCT 

1551 GAGCGGCATC GjTGCAGCAGC AGAACAACCT CCTCCGCGCC ATCGAGGCCC 

1601 AGCAGCATAT QCTCCAGCTC ACCGTGTGGG GCATCAAGCA GCTCCAGGCC 

1551 CGCGTGCTGG CCGTGGAGCG CTACCTGAAG GACCAGCAGC TCCTGGGC77 

17 01 C^GGGGCTGC TCCGGCAAGC TGATCTGCAC CACCACGGTA QCCTGGAACG 

17 51 CCTCCTGGAG CAACAAGAGC CTGGACGACA TCTGGAACAA CATGACCTGG 

1301 ATGCAGTGGG AGCGCGAGAT CGATAACTAC ACCAGCCTGA TCTACAGCC7 

13 51 GCTGGAGAAG AGCCAGACCC AGCAGGAGAA GAACGAGCAG GAGCTGCTGG 

1901 AGCTGGACAA CFGGGCGAGC CTGTGGAAC? GGT7CGACAT CACCAACTGG 

19 51 CTGTGGTACA TGAAAATCTT CATCATGATT GTGGGCGGCC TGGTGGGCCT 
2 001 CCGCATCGTG TrCGCCGTGC TGAGCATCGT GAACCGCGTG CGCCAGGGCT 

20 51 ACAGCCCCCT C^AGCCTCCAG ACCCGGCCCC CCGTGCCGCG CGGGCCCGAC 
2101 CGCCCCGAGG GpATCGAGGA GGAGGGCGGC GAGCGCGACC GCGACACCAG 
2151 CGGCAGGCTC QTGCACGGCT 7CCTGGCGAT CATCTGGGTC GACCTCCGCA 
2201 GCCTGTTCCT ^TTCAGCTAC CACCACCGCG ACCTGCTGCT GATCGCCGCC 
2251 CGCATCGTGG AACTCCTAGG CCGCCGCGGC TGGGAGGTGC TGAAGTACTG 
23 01 GTGGAACCTC CTCCAGTATT GGAGCCAGGA GCTGAAGTCC AGCGCCGTGA 

23 51 GCCTGCTGAA CGCCACCGCC ATCGCCGTGG CCGAGGGCAC CGACCGCGTG 
2 401 ATCGAGGTGC TCCAG AGGGC CGGGAGGGCG A7CCTGCACA TCCCCACCCG 

24 51 CATCCGCCAG CGGCTCGAGA GGGCGCTGCT G (SEQ ID NO: 35) 
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1 GAATTCACGC GTAAGCTTGC CGCCACCATG GTGAGCAAGG GCGAGGAGCT 

51 GTTCACCGGG GTGGTGCCCA TCCTGGTCGA GCTGGACGGC GACGTGAACG 

101 GCCACAAGTT CAGCGTGTCC GGCGAGGGCG AGGGCGATGC CACCTACGGC 

151 AAGCTGACCC TGAAGTTCAT CTGCACCACC GGCAAGCTGC CCGTGCCCTG 

201 GCCCACCCTC GTGACCACCT TCAGCTACGG CGTGCAGTGC TTCAGCCGCT 

251 ACCCCGACCA CATGAAGCAG CACGACTTCT TCAAGTCCGC CATGCCCGAA 

3 01 GGCTACGTCC AGGAGCGCAC CATCTTCTTC AAGGACGACG GCAACTACAA 

351 GACCCGCGCC GAGGTGAAGT TCGAGGGCGA CACCCTGGTG AACCGCATCG 

401 AGCTGAAGGG CATCGACTTC AAGGAGGACG GCAACATCCT GGGGCACAAG 

451 CTGGAGTACA ACTACAACAG CCACAACGTC TATATCATGG CCGACAAGCA 

501 GAAGAACGGC ATCAAGGTGA ACTTCAAGAT CCGCCACAAC ATCGAGGACG 

551 GCAGCGTGCA GCTCGCCGAC CACTACCAGC AGAACACCCC CATCGGCGAC 

601 GGCCCCGTGC TGCTGCCCGA CAACCACTAC CTGAGCACCC AGTCCGCCCT 

651 GAGCAAAGAC CCCAACGAGA AGCGCGATCA CATGGTCCTG CTGGAGTTCG 

701 TGACCGCCGC CGGGATCACT CACGGCATGG ACGAGCTGTA CAAGTAAAGC 

751 GGCCGCGGAT CC (SEQ ID NO: 40) 
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Native Factor VIII B domain deleted gene segment inserted in the 
expression vector 

1 AAGCTTAAAC CATGCCCATG GGGTCTCTGC AACCGCTGGC CACCTTGTAC 

51 CTGCTGGGGA TGCTGGTCGC TTCCGTGCTA GCCGCCACCA GAAGATACTA 

101 CCTGGGTGCA GTGGAACTGT CATGGGACTA TATGCAAAGT GATCTCGGTG 

151 AGCTGCCTGT GGACGCAAGA TTTCCTCCTA GAGTGCCAAA ATCTTTTCCA 

201 TTCAACACCT CAGTCGTGTA CAAAAAGACT CTGTTTGTAG AATTCACGGA 

251 TCACCTTTTC AACATCGCTA AGCCAAGGCC ACCCTGGATG GGTCTGCTAG 

301 GTCCTACCAT CCAGGCTGAG GTTTATGATA CAGTGGTCAT TACACTTAAG 

351 AACATGGCTT CCCATCCTGT CAGTCTTCAT GCTGTTGGTG TATCCTACTG 

401 GAAAGCTTCT GAGGGAGCTG AATATGATGA TCAGACCAGT CAAAGGGAGA 

451 AAGAAGATGA TAAAGTCTTC CCTGGTGGAA GCCATACATA TGTCTGGCAG 

501 GTCCTGAAAG AGAATGGTCC AATGGCCTCT GACCCACTGT GCCTTACCTA 

551 CTCATATCTT TCTCATGTGG ACCTGGTAAA AGACTTGAAT TCAGGCCTCA 

601 TTGGAGCCCT ACTAGTATGT AGAGAAGGGA GTCTGGCCAA GGAAAAGACA 

651 CAGACCTTGC ACAAATTTAT ACTACTTTTT GCTGTATTTG ATGAAGGGAA 

701 AAGTTGGCAC TCAGAAACAA AGAACTCCTT GATGCAGGAT AGGGATGCTG 

751 CATCTGCTCG GGCCTGGCCT AAAATGCACA CAGTCAATGG TTATGTAAAC 

801 AGGTCTCTGC CAGGTCTGAT TGGATGCCAC AGGAAATCAG TCTATTGGCA 

851 TGTGATTGGA ATGGGCACCA CTCCTGAAGT GCACTCAATA TTCCTCGAAG 

901 GTCACACATT TCTTGTGAGG AACCATCGCC AGGCGTCCTT GGAAATCTCG 

951 CCAATAACTT TCCTTACTGC TCAAACACTC TTGATGGACC TTGGACAGTT 

1001 TCTACTGTTT TGTCATATCT CTTCCCACCA ACATGATGGC ATGGAAGCTT 

1051 ATGTCAAAGT AGACAGCTGT CCAGAGGAAC CCCAACTACG AATGAAAAAT 

1101 AATGAAGAAG CGGAAGACTA TGATGATGAT CTTACTGATT CTGAAATGGA 

1151 TGTGGTCAGG TTTGATGATG ACAACTCTCC TTCCTTTATC CAAATTCGCT 

1201 CAGTTGCCAA GAAGCATCCT AAAACTTGGG TACATTACAT TGCTGCTGAA 

1251 GAGGAGGACT GGGACTATGC TCCCTTAGTC CTCGCCCCCG ATGACAGAAG 

1301 TTATAAAAGT CAATATTTGA ACAATGGCCC TCAGCGGATT GGTAGGAAGT 

1351 ACAAAAAAGT CCGATTTATG GCATACACAG ATGAAACCTT TAAGACTCGT 

1401 GAAGCTATTC AGCATGAATC AGGAATCTTG GGACCTTTAC TTTATGGGGA 

1451 AGTTGGAGAC ACACTGTTGA TTATATTTAA GAATCAAGCA AGCAGACCAT 

1501 ATAACATCTA CCCTCACGGA ATCACTGATG TCCGTCCTTT GTATTCAAGG 

1551 AGATTACCAA AAGGTGTAAA ACATTTGAAG GATTTTCCAA TTCTGCCAGG 

1601 AGAAATATTC AAATATAAAT GGACAGTGAC TGTAGAAGAT GGGCCAACTA 

1651 AATCAGATCC TCGGTGCCTG ACCCGCTATT ACTCTAGTTT CGTTAATATG 

1701 GAGAGAGATC TAGCTTCAGG ACTCATTGGC CCTCTCCTCA TCTGCTACAA 

1751 AGAATCTGTA GATCAAAGAG GAAACCAGAT AATGTCAGAC AAGAGGAATG 

1801 TCATCCTGTT TTCTGTATTT GATGAGAACC GAAGCTGGTA CCTCACAGAG 

1851 AATATACAAC GCTTTCTCCC CAATCCAGCT GGAGTGCAGC TTGAGGATCC 

1901 AGAGTTCCAA GCCTCCAACA TCATGCACAG CATCAATGGC TATGTTTTTG 

1951 ATAGTTTGCA GTTGTCAGTT TGTTTGCATG AGGTGGCATA CTGGTACATT 

2001 CTAAGCATTG GAGCACAGAC TGACTTCCTT TCTGTCTTCT TCTCTGGATA 

2051 TACCTTCAAA CACAAAATGG TCTATGAAGA CACACTCACC CTATTCCCAT 

2101 TCTCAGGAGA AACTGTCTTC ATGTCGATGG AAAACCCAGG TCTATGGATT 

2151 CTGGGGTGCC ACAACTCAGA CTTTCGGAAC AGAGGCATGA CCGCCTTACT 

2201 GAAGGTTTCT AGTTGTGACA AGAACACTGG TGATTATTAC GAGGACAGTT 

2251 ATGAAGATAT TTCAGCATAC TTGCTGAGTA AAAACAATGC CATTGAACCA 

2301 AGAAGCTTCT CCCAGAATTC AAGACACCCT AGCACTAGGC AAAAGCAATT 

2351 TAATGCCACC CCACCAGTCT TGAAACGCCA TCAACGGGAA ATAACTCGTA 

2401 CTACTCTTCA GTCAGATCAA GAGGAAATTG ACTATGATGA TACCATATCA 

2451 GTTGAAATGA AGAAGGAAGA TTTTGACATT TATGATGAGG ATGAAAATCA 

2501 GAGCCCCCGC AGCTTTCAAA AGAAAACACG ACACTATTTT ATTGCTGCAG 

2551 TGGAGAGGCT CTGGGATTAT GGGATGAGTA GCTCCCCACA TGTTCTAAGA 

2601 AACAGGGCTC AGAGTGGCAG TGTCCCTCAG TTCAAGAAAG TTGTTTTCCA 

2651 GGAATTTACT GATGGCTCCT TTACTCAGCC CTTATACCGT GGAGAACTAA 

2701 ATGAACATTT GGGACTCCTG GGGCCATATA TAAGAGCAGA AGTTGAAGAT 
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2751 AATATCATGG TAACTTTCAG AAATCAGGCC TCTCGTCCCT ATTCCTTCTA 

2801 TTCTAGCCTT ATTTCTTATG AGGAAGATCA GAGGCAAGGA GCAGAACCTA 

2851 GAAAAAACTT TGTCAAGCCT AATGAAACCA AAACTTACTT TTGGAAAGTG 

2901 * CAACATCATA TGGCACCCAC TAAAGATGAG TTTGACTGCA AAGCCTGGGC 

2951 TTATTTCTCT GATGTTGACC TGGAAAAAGA TGTGCACTCA GGCCTGATTG 

3001 GACCCCTTCT GGTCTGCCAC ACTAACACAC TGAACCCTGC TCATGGGAGA 

3051 CAAGTGACAG TACAGGAATT TGCTCTGTTT TTCACCATCT TTGATGAGAC 

3101 CAAAAGCTGG TACTTCACTG AAAATATGGA AAGAAACTGC AGGGCTCCCT 

3151 GCAATATCCA GATGGAAGAT CCCACTTTTA AAGAGAATTA TCGCTTCCAT 

3201 GCAATCAATG GCTACATAAT GGATACACTA CCTGGCTTAG TAATGGCTCA 

3251 GGATCAAAGG ATTCGATGGT ATCTGCTCAG CATGGGCAGC AATGAAAACA 

3301 TCCATTCTAT TCATTTCAGT GGACATGTGT TCACTGTACG AAAAAAAGAG 

3351 GAGTATAAAA TGGCACTGTA CAATCTCTAT CCAGGTGTTT TTGAGACAGT 

3401 GGAAATGTTA CCATCCAAAG CTGGAATTTG GCGGGTGGAA TGCCTTATTG 

3451 GCGAGCATCT ACATGCTGGG ATGAGCACAC TTTTTCTGGT GTACAGCAAT 

3501 AAGTGTCAGA CTCCCCTGGG AATGGCTTCT GGACACATTA GAGATTTTCA 

3551 GATTACAGCT TCAGGACAAT ATGGACAGTG GGCCCCAAAG CTGGCCAGAC 

3601 TTCATTATTC CGGATCAATC AATGCCTGGA GCACCAAGGA GCCCTTTTCT 

3651 TGGATCAAGG TGGATCTGTT GGCACCAATG ATTATTCACG GCATCAAGAC 

3701 CCAGGGTGCC CGTCAGAAGT TCTCCAGCCT CTACATCTCT CAGTTTATCA 

3751 TCATGTATAG TCTTGATGGG AAGAAGTGGC AGACTTATCG AGGAAATTCC 

3801 ACTGGAACCT TAATGGTCTT CTTTGGCAAT GTGGATTCAT CTGGGATAAA 

3851 ACACAATATT TTTAACCCTC CAATTATTGC TCGATACATC CGTTTGCACC 

3901 CAACTCATTA TAGCATTCGC AGCACTCTTC GCATGGAGTT GATGGGCTGT 

3951 GATTTAAATA GTTGCAGCAT GCCATTGGGA ATGGAGAGTA AAGCAATATC 

4001 AGATGCACAG ATTACTGCTT CATCCTACTT TACCAATATG TTTGCCACCT 

4051 GGTCTCCTTC AAAAGCTCGA CTTCACCTCC AAGGGAGGAG TAATGCCTGG 

4101 AGACCTCAGG TGAATAATCC AAAAGAGTGG CTGCAAGTGG ACTTCCAGAA 

4151 GACAATGAAA GTCACAGGAG TAACTACTCA GGGAGTAAAA TCTCTGCTTA 

4201 CCAGCATGTA TGTGAAGGAG TTCCTCATCT CCAGCAGTCA AGATGGCCAT 

4251 CAGTGGACTC TCTTTTTTCA GAATGGCAAA GTAAAGGTTT TTCAGGGAAA 

4301 TCAAGACTCC TTCACACCTG TGGTGAACTC TCTAGACCCA CCGTTACTGA 

4351 CTCGCTACCT TCGAATTCAC CCCCAGAGTT GGGTGCACCA GATTGCGCTG 

4401 AGGATGGAGG TTCTGGGCTG CGAGGCACAG GACCTCTACT GAGGGTGGCC 

4451 ACTGCAGCAC CTGCCACTGC CGTCACCTCT CCCTCCTCAG CTCCAGGGCA 

4501 GTGTCCCTCC CTGGCTTGCC TTCTACCTTT GTGCTAAATC CTAGCAGACA 

4551 CTGCCTTGAA GCCTCCTGAA TTAACTATCA TCAGTCCTGC ATTTCTTTGG 

4601 TGGGGGGCCA GGAGGGTGCA TCCAATTTAA CTTAACTCTT ACCGTCGACC 

4651 TGCAGGCCCA ACGCGGCCGC 
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Synthetic Factor VIII B domain deleted gene segment inserted in the 
expression vector 

1 1 AAGCTTAAAC CATGCCCATG GGGTCTCTGC AACCGCTGGC CACCTTGTAC 

51 CTGCTGGGGA TGCTGGTCGC TTCCGTGCTA GCCGCCACCC GCCGCTACTA 

101 CCTGGGCGCC GTGGAGCTGT CCTGGGACTA CATGCAGAGC GACCTGGGCG 

151 AGCTCCCCGT GGACGCCCGC TTCCCCCCCC GCGTGCCCAA GAGCTTCCCC 

201 TTCAACACCA GCGTGGTGTA CAAGAAAACC CTGTTCGTGG AGTTCACCGA 

251 CCACCTGTTC AACATTGCCA AGCCGCGCCC CCCCTGGATG GGCCTGCTGG 

301 GCCCCACCAT CCAGGCCGAG GTGTACGACA CCGTGGTGAT CACCCTGAAG 

351 AACATGGCCA GCCACCCCGT CAGCCTGCAC GCCGTGGGCG TGAGCTACTG 

401 GAAGGCCAGC GAGGGCGCCG AGTACGACGA CCAGACGTCC CAGCGCGAGA 

451 AGGAGGACGA CAAGGTGTTC CCGGGGGGGA GCCACACCTA CGTGTGGCAG 

501 GTGCTTAAGG AGAACGGCCC TATGGCCAGC GACCCCCTGT GCCTGACCTA 

551 CAGCTACCTG AGCCACGTGG ACCTGGTGAA GGATCTGAAC AGCGGGCTGA 

601 TCGGCGCCCT GCTGGTGTGT CGCGAGGGCA GCCTGGCCAA GGAGAAAACC 

651 CAGACCCTGC ACAAGTTCAT CCTGCTGTTC GCCGTGTTCG ACGAGGGGAA 

701 GAGCTGGCAC AGCGAGACTA AGAACAGCCT GATGCAGGAC CGCGACGCCG 

751 CCAGCGCCCG CGCCTGGCCC AAGATGCACA CCGTTAACGG CTACGTGAAC 

801 CGCAGCCTGC CCGGCCTGAT CGGCTGCCAC CGCAAGAGCG TGTACTGGCA 

851 CGTCATCGGC ATGGGCACCA CCCCTGAGGT GCACAGCATC TTCCTGGAGG 

901 GCCACACCTT CCTGGTGCGC AACCACCGCC AGGCCAGCCT GGAGATCAGC 

951 CCCATCACCT TCCTGACTGC CCAGACCCTG CTGATGGACC TAGGCCAGTT 

1001 CCTGCTGTTC TGCCACATCA GCAGCCACCA GCACGACGGC ATGGAGGCTT 

1051 ACGTGAAGGT GGACAGCTGC CCCGAGGAGC CCCAGCTGCG CATGAAGAAC 

1101 AACGAGGAGG CCGAGGACTA CGACGACGAC CTGACCGACA GCGAGATGGA 

1151 TGTCGTACGC TTCGACGACG ACAACAGCCC CAGCTTCATC CAGATCCGCA 

1201 GCGTGGCCAA GAAGCACCCT AAGACCTGGG TGCACTACAT CGCCGCCGAG 

1251 GAGGAGGACT GGGACTACGC CCCGCTAGTA CTGGCCCCCG ACGACCGCAG 

1301 CTACAAGAGC CAGTACCTGA ACAACGGCCC CCAGCGCATC GGCCGCAAGT 

1351 ACAAGAAGGT GCGCTTCATG GCCTACACCG ACGAGACTTT CAAGACCCGC 

1401 GAGGCCATCC AGCACGAGTC CGGCATCCTC GGCCCCCTGC TGTACGGCGA 

1451 GGTGGGCGAC ACCCTGCTGA TCATCTTCAA GAACCAGGCC AGCAGGCCCT 

1501 ACAACATCTA CCCCCACGGC ATCACCGACG TGCGCCCCCT GTACAGCCGC 

1551 CGCCTGCCCA AGGGCGTGAA GCACCTGAAG GACTTCCCCA TCCTGCCCGG 

1601 CGAGATCTTC AAGTACAAGT GGACCGTGAC CGTGGAGGAC GGCCCCACCA 

1651 AGAGCGACCC CCGCTGCCTG ACCCGCTACT ACAGCAGCTT CGTGAACATG 

1701 GAGCGCGACC TGGCCTCCGG ACTGATCGGC CCCCTGCTGA TCTGCTACAA 

1751 GGAGAGCGTG GACCAGCGCG GCAACCAGAT CATGAGCGAC AAGCGCAACG 

1801 TGATCCTGTT CAGCGTGTTC GACGAGAACC GCAGCTGGTA TCTGACCGAG 

i851 AACATCCAGC GCTTCCTGCC CAACCCCGCT GGCGTGCAGC TGGAAGATCC 

1901 CGAGTTCCAG GCCAGCAACA TCATGCACAG CATCAACGGC TACGTGTTCG 

1951 ACAGCCTGCA GCTGAGCGTG TGCCTGCATG AGGTGGCCTA CTGGTACATC 

2001 CTGAGCATCG GCGCCCAGAC CGACTTCCTG AGCGTGTTCT TCTCCGGGTA 

2051 TACCTTCAAG CACAAGATGG TGTACGAGGA CACCCTGACC CTGTTCCCCT 

2101 TCTCCGGCGA GACTGTGTTC ATGTCTATGG AGAACCCCGG CCTGTGGATT 

2151 CTGGGCTGCC ACAACAGCGA CTTCCGCAAC CGCGGCATGA CTGCCCTGCT 

2201 GAAAGTCTCC AGCTGCGACA AGAACACCGG CGACTACTAC GAGGACAGCT 

2251 ACGAGGACAT CTCCGCCTAC CTGCTGTCCA AGAACAACGC CATCGAGCCC 

2301 CGCTCCTTCT CCCAAAACTC CCGCCACCCC AGCACGCGTC AGAAGCAGTT 

2351 CAACGCCACC CCCCCCGTGC TGAAGCGCCA CCAGCGCGAG ATCACCCGCA 

2401 CCACCCTGCA AAGCGACCAG GAGGAGATCG ACTACGACGA CACCATCAGC 

2451 GTGGAGATGA AGAAGGAGGA CTTCGACATC TACGACGAGG ACGAGAACCA 

2501 GAGCCCCCGC TCCTTCCAAA AGAAAACCCG CCACTACTTC ATCGCCGCCG 

2551 TGGAGCGCCT GTGGGACTAC GGCATGAGCA GCAGCCCCCA CGTCCTGCGC 

2601 AACCGCGCCC AGAGCGGCAG CGTGCCCCAG TTCAAGAAGG TGGTGTTCCA 

2651 GGAGTTCACC GACGGCAGCT TCACCCAGCC CCTGTACCGC GGCGAGCTGA 

2701 ACGAGCACCT GGGCCtGCTC GGCCCCTACA TCCGCGCCGA GGTGGAGGAC 
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2751 AACATCATGG TGACCTTCCG 

2801 CTCCTCCCTG ATCAGCTACG 

2851 GCAAGAACTT CGTCAAGCCC 

2901 CAGCACCACA TGGCCCCCAC 

2951 CTACTTCAGC GACGTGGACC 

3001 GCCCCCTGCT GGTGTGCCAC 

3051 CAGGTGACTG TGCAGGAATT 

3101 TAAGAGCTGG TACTTCACCG 

3151 GCAACATCCA GATGGAAGAT 

3201 GCCATCAACG GCTACATCAT 

3251 GGACCAGCGC ATCCGCTGGT 

3301 TCCACAGCAT CCACTTCAGC 

3351 GAGTACAAGA TGGCCCTGTA 

3401 GGAGATGCTG CCCAGCAAGG 

3451 GCGAGCACCT GCACGCCGGC 

3501 AAGTGCCAGA CCCCCCTGGG 

3551 GATCACCGCC AGCGGCCAGT 

3601 TGCACTACAG CGGCAGCATC 

3651 TGGATCAAGG TGGACCTGCT 

3701 CCAGGGCGCC CGCCAGAAGT 

3751 TCATGTACTC TCTAGACGGC 

3801 ACCGGCACCC TGATGGTGTT 

3851 GCACAACATC TTCAACCCCC 

3901 CCACCCACTA CAGCATCCGC 

3951 GACCTGAACA GCTGCAGCAT 

4001 CGACGCCCAG ATCACCGCCT 

4051 GGAGCCCCAG CAAGGCCCGC 

4101 CGCCCCCAGG TGAACAACCC 

4151 AACCATGAAG GTGACTGGCG 

4201 CCAGCATGTA CGTGAAGGAG 

4251 CAGTGGACCC TGTTCTTCCA 

4301 CCAGGACAGC TTCACACCGG 

4351 CCCGCTACCT GCGCATCCAC 

4401 CGCATGGAGG TGCTGGGCTG 

4451 C 



CAACCAAGCC TCCCGGCCCT ACTCCTTCTA 
AGGAGGACCA GCGCCAGGGC GCCGAGCCCC 
AACGAGACTA AGACCTACTT CTGGAAGGTG 
CAAGGACGAG TTCGACTGCA AGGCCTGGGC 
TGGAGAAGGA CGTGCACAGC GGCCTGATCG 
ACCAACACCC TGAACCCCCC CCACGGGAGG 
TGCCCTGTTC TTCACCATCT TCGACGAGAC 
AGAACATGGA GCGCAACTGC CGCGCCCCCT 
CCCACCTTCA AGGAGAACTA CCGCTTCCAC 
GGACACCCTG CCCGGCCTGG TGATGGCCCA 
ACCTGCTGTC TATGGGCAGC AACGAGAACA 
GGCCACGTTT TCACCGTGCG CAAGAAGGAG 
CAACCTGTAC CCCGGCGTGT TCGAGACTGT 
CCGGGATCTG GCGCGTGGAG TGCCTGATCG 
ATGAGCACCC TGTTCCTGGT GTACAGCAAC 
CATGGCCAGC GGCCACATCC GCGACTTCCA 
ACGGCCAGTG GGCTCCCAAG CTGGCCCGCC 
AACGCCTGGT CGACCAAGGA GCCCTTCTCC 
GGCCCCCATG ATCATCCACG GCATCAAGAC 
TCAGCAGCCT GTACATCAGC CAGTTCATCA 
AAGAAGTGGC AGACCTACCG CGGCAACAGC 
CTTCGGCAAC GTGGACAGCA GCGGCATCAA 
CCATCATCGC CCGCTACATC CGCCTGCACC 
AGCACCCTGC GCATGGAGCT GATGGGCTGC 
GCCCCTGGGC ATGGAGAGCA AGGCCATCAG 
CCAGCTACTT CACCAACATG TTCGCCACCT 
CTGCACCTGC AGGGCCGCAG CAACGCCTGG 
CAAGGAGTGG CTGCAGGTGG ACTTCCAGAA 
TGACCACCCA GGGCGTCAAG AGCCTGCTGA 
TTCCTGATCA GCAGCAGCCA GGACGGCCAC 
AAACGGCAAG GTGAAGGTGT TCCAGGGCAA 
TCGTGAACAG CCTGGACCCC CCCCTGCTGA 
CCCCAGAGCT GGGTGCACCA GATCGCCCTG 
CGAGGCCCAG GACCTGTACT GAAGCGGCCG 
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longer lived at that address and that no forwarding address was available. 

3. The last known address of Dr. Haas was Huberweg 13, 69198 Schriesheim, Germany. 

4. I declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 



punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the application or any 
patents issued thereon. 
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^ T hereby certify undtf 37 CFR 1.8(a) thafdiis correspondence is being deposited with the United States Postal Service as first class 
mail with sufficienrpostage on the dateMndicated above and is addressed to the Assistant Commissioner of Patents and Trademarks, 

(Type or pnnt name of person mailin^cAper or fee) (Signature of person mailing paper or fefej 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

Applicant : Brian Seed et al. Art Unit: 

Serial No.: 08/717,294 Examiner: 

Filed : September 20, 1 996 

Title : HIGH LEVEL EXPRESSION OF PROTEINS 

Assistant Commissioner of Patents and Trademarks 
Washington, DC 20231 

DECLARATION OF DR. BRIAN SEED 

1 . I am a named inventor on the above-identified patent application. 

2. As indicated in the accompanying copy of my Combined Declaration and Power of 
Attorney, I have declared that I am an original, first, and joint inventor of the subject matter 
which is claimed in this application and for which a patent is sought. 

3. The other named inventor on the above-identified application, Jurgen Haas, is no 

longer working in my laboratory. 

4. His last known address was Huberweg 13, 69198 Schriesheim, Germany. 

5. It is my understanding that representatives of the General Hospital Corporation (the 
assignees of this application) have diligently tried to reach Dr. Haas at this address, but have 
been unsuccessful. 

6. I declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 



punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the application or any 
patents issued thereon. 



Date: Qta^h^- rf^b 

Bri^n Seed 
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